Factors Driving the Spatiotemporal Variation of Dissolved Organic Carbon Flux from Croplands in the Midwestern USA


 Improving understanding of dissolved organic carbon (DOC) cycling from farmlands to rivers is a challenge due to the complex influence of farming practices, the hydrology of predominantly flat lowlands, and seasonal snowpack effects. Monthly field DOC concentrations were measured throughout the year at sub-basin scale across the Chippewa River Watershed, which falls within the Corn Belt of the Midwestern United States. The observations from croplands were benchmarked against the data sampled from hilly forested areas in the Connecticut River Watershed. The Soil Water Assessment Tool (SWAT) was used to simulate daily soil water properties. This method tests for a framework for using the combination of new field data, hydrological modelling, and knowledge-based reclassification of Land Use/Land Cover (LULC) to analyze the predictors of both the spatial and temporal changes of DOC over farmlands. Our results show: 1) DOC concentrations from cropland baseflow were substantially high throughout the year, especially for spring runoff/snowmelt scenarios, 2) gradient analysis with spatial factors only was able to explain ~82% of observed annual mean DOC concentrations, and 3) with both spatial and temporal factors: [Evapotranspiration, Soil Water, and Ground Water], the analysis explained ~81% of seasonal and ~54% of daily variations in observed DOC concentrations.


INTRODUCTION 34
Terrestrial organic carbon in the dissolved form (DOC) is easily transported to inland or coastal 35 ecosystems (Tranvik 2014). Recent studies raised a public health concern that rivers receiving runoff 41 from croplands have elevated DOC concentrations at regional and worldwide scales (Bhattacharya and 42 The DOC dynamics in streamflow generated from agricultural landscapes are significantly different 68 than those reported for semi-natural habitats. This is especially true for the Corn Belt region of the 69 Midwestern USA, where flatland hydrology dictates temporal variations in DOC concentrations via 70 baseflow rather than overland surface runoff (Qiao et al. 2017). This is in sharp contrast to hilly forested 71 areas where the majority of annual DOC fluctuations are dictated predominantly by surface runoff 72 during rainfall events. Even over forested lands, it has been reported that low-flow conditions are ideal 73 in order to best understand the transport processes of nutrients (e.g., N and P) from the soil profile and In general, the factors driving spatiotemporal variation of DOC concentrations in streams of 76 croplands between rainfall events are not well understood. These temporal DOC variations are 77 assumably controlled by soil properties such as organic matter content, level of saturation, and 78 temperature. The spatial variations are likely associated with heterogeneity of the landscape, such as the 79 field observations collected across multiple years and seasons were paramount to better quantify the 124 significant DOC contributions from croplands in comparison to forested landscapes. 125 126

Field Sample Protocols 127
There were 278 water samples collected in total, from 26 sampling locations via 43 field visits 128 spread between the two study sites. Within the primary study site, a range of locations were selected and 129 organized into two sampling groups (crop dominated and mixed croplands) as displayed in Figure 1. The 130 13 sampling locations (IDs 1-13) associated with the first group focused on crop-dominated drainage 131 sub-basins. The 8 sampling locations (IDs 14-21) associated with the second group was added to include 132 drainage sub-basins dominated by the croplands mixed with a large proportion of forests and wetlands. 133 Monthly field visits were arranged between rainfall events to best reflect seasonal variations and to 134 reduce the influence of any single event. The entire cropland field campaign ran for 14 months 135 Field samples were obtained from all sampling locations using clean 500 mL bottles (acid washed) 148 to collect stream water for DOC analysis. Bottles were fully immersed while capping to ensure that no 149 air remained. The samples were immediately stored on ice until arrival at the lab for immediate 150 filtering. Water samples were filtered through pre-combusted glass-fiber filters (nominal 0.7 µm pore 151 size) to remove any non-dissolved organic matter, and the filtrate was then stored (acidified and 152 refrigerated) until analysis for DOC content. All Laboratory processes were completed within 12 hours 153 of sample collection. The DOC concentration of each water sample was measured using a Shimadzu 154 TOC-V analyzer with high temperature combustion (Vlahos et al. 2002 National Land Cover Data (NLCD) data from 2011 was used because this date was closest to our 160 sampling dates (2012-2014). The original baseline NLCD data referenced 13 land cover categories. We 161 reclassified these baseline land cover data into three more broadly defined classes according to their 162 general DOC transforming rates based on preliminary modelling runs ( Figure 2). The percent-areal 163 composition of the three new classes extracted from the NLCD data were calculated for each drainage 164 sub-basin in the primary study site and are shown in Table 1. These data were used to analyze DOC 165 dynamics at the sub-basin level. It is important to note that land cover data was needed for the entire 166 area draining to an individual sampling location and not just the sub-basin in which the sampling 167 location resided. For instance, sampling point 15 receives runoff not only from its home (immediate) 168 sub-basin, but also from upstream sub-basins that contained points 16, 17 and 18. 2018. In addition, spatially explicit, high resolution (4 km 2 grid) daily weather forcing data 178 (precipitation, maximum temperature, and minimum temperature) were downloaded from the PRISM 179 website (https://prism.oregonstate.edu/recent/) and also used as SWAT inputs. Daily discharge flow of 180 the Chippewa River was obtained from the USGS 04154000-gauge station and used to calibrate the 181 SWAT model. LULC types were extracted from NCLD 2011 as described above and used as inputs. 182 Soil input parameters were extracted from SWAT's built-in STATSGO data. Elevation data were 183 extracted from a 30m Digital Elevation Model (DEM) of Isabella County, Michigan. As a result, the 184 SWAT model delineated our primary study site into 126 sub-basins. 185 186 Next, Chippewa River daily flow data were categorized into high (top 10 percentile), medium 187 (10th-50th percentile), and low (lowest 50 th -100 th percentile) flows. The SWAT model was then 188 calibrated through a multi-objective auto-calibration for these three flow regimes for the period from [s1(i,t), s2(i,t), … sn(i,t)] be the n exploratory seasonal (temporal) variables for the sub-basin i and Julian 205 day t, for t = 1, 2…Q. Q is the last Julian day of field sampling visits. Each variable set X(i) were split in 206 three categories: land cover (e.g., percent cropped area, percent forested), soil properties (e.g., percent 207 silt, percent organic matter), and geomorphology (e.g., sub-basin area, average slope). The spatial 208 variables were derived from each drainage area. The seasonal variables S(i, t) were split into two 209 categories: hydrologic characteristics (e.g., SW: soil water content, PET: potential evapotranspiration, 210 GW: ground water volume) and weather inputs (e.g., point specific precipitation and temperatures). The 211 observed DOC, y (i, t) corresponded with all sampling locations i, and the Julian days t of field visits. 212 The observed values y (i, t) for all is and ts were fitted to eq. 1 for estimating the parameters 213

214
Statistical metrics used to evaluate relative importance and/or inclusion of variables into the models 215 were p-value, coefficient of determination (R 2 ), and F values reflecting the overall significance of each 216 regression model. An alpha of 0.05 was used as the threshold to determine if any one variable was 217 statistically meaningful. The analysis included all exploratory variables (in varying combinations) within 218 each category (temporal, spatial, physical, and biological). Our objective was to identify which variables 219 were significant at various temporal scales (i.e., daily, monthly, or annually). Accordingly, all 220 associated data had to be averaged when moving to a more course temporal scale. For example, ~30 221 daily hydrologic variables were averaged to yield a single monthly value. Similarly, ~365 daily 222 hydrologic variables were averaged to yield an annual average. Ultimately, a variety of variables were 223 fit to both raw and averaged (monthly and yearly) DOC observations to obtain the desired coefficients of 224 the linear model. 225

226
One of our research objectives was to evaluate the appropriate spatial scale (i.e., contributing 227 hydrologic areal extent) for quantifying the inherent relationship between a variety of independent 228 variables and the variation of observed DOC concentrations from cropland areas. Here, we designated 229 stream segment i and sub-basin i as the immediate stream segment and sub-basin in which the sampling 230 location i resides. With this designation, this study tested two spatial scale scenarios: 1st-order 231 extended sub-basins and non-extended drainage areas. All 1st-order extended sub-basins met one of the 232 following two criteria: (1) all sub-basins associated with sampling sites within a 1st-order stream, such 233 as sampling locations 4, 6, 7, 9, 10, 11, 13, 16, 18, 19, and 20, or (2) all sub-basins associated with 234 sampling sites within a 2nd or 3rd order stream if they had the same dominant land cover as all their 235 upper-tributary sub-basins. Sampling locations 5, 15, and 17 met this second criterion and were 236 designated as extended drainage sub-basins. For example, sub-basin 5 resides in a 2nd-order stream and 237 its dominant land cover is identical to that of its upper tributary sub-basin, which in this case was sub-238 basin 4 (1st-order). Thus, the modelled drainage area associated with sampling location 5 included both 239 the areal extent of the upstream 1st-order sub-basin 4 and its immediate 2nd order sub-basin 5. 240 Therefore, the extended sub-basins in this project consisted of 14 sub-basins: 4, 6, 7, 9, 10, 11, 13, 16,

261
Where the first three variables (i.e., crp, frt, wetl) were percent of crops, forest, and wetland, and dum 262 was the binary dummy variable useful for separating crops and mixed land covers (crop = 1, mixed = 0). 263 All coefficients for each of the four variables and the y-intercept were significant with respect to the basins into a more integrative model for use at a larger or regional scale. In other words, integrative 300 modelling can indeed estimate DOC concentrations within high order stream segments by cumulatively 301 aggregating or mixing DOC loadings from modelled upstream sub-basins (Tian et al. 2002). This 302 integrative and hydrologically cumulative approach appears to better approximate the reality of how the 303 many interconnected watershed components exchange materials from 1st-order to higher-order 304 downstream receiving waters. 305 Both our statistical analyses and encouraging modelling outputs of DOC dynamics at the annual 307 scale advances the scientific communities understanding of how anthropogenic agricultural land use 308 changes and related management practices can help to explain spatial disparities of DOC concentrations 309 originating from croplands. Our results also highlight the need for comprehensive in-situ sampling 310 campaigns across different ecoregions and climatic zones to help fine tune regional and global models. 311 The variables identified in eqs. 2 and 3 are extractable from satellite images, which helps bridge the gap 312 from more localized modelling efforts to more regional and global spatial scales that take advantage of 313 remote sensing technologies. Thus, this investigation verified the feasibility of using easily-obtainable 314 satellite image data to detect DOC dynamics and to study the impacts of human activities and 315 management policies on both land sustainability and the ecology of freshwater habitats. 316 317

Seasonal DOC trends from crop and forest lands 318
Strong seasonal patterns in DOC concentrations were inherent in our data, and the associated driving 319 factors became readily apparent from the analysis of daily and monthly DOC export rates. The patterns 320 in Figure 4A  December (correlation > 0.65). We feel confident that these parallel DOC export curves are seasonal 358 effects rather than of individual storm events since the curves were derived from a large set of multi-359 year, independent field measurements (i.e., 50 field visits). DOC spikes in May were triggered by 360 increased soil temperatures followed by elevated ground water volumes (Figure 4). It seems logical that 361 soils at depth take a longer time to warm up than the land surface temperature. Thus, the flux of DOC 362 stored in such soil profiles were delayed until May at-depth soil temperatures began to rise ( Figure 4A). 363 High ground water levels and low soil water content were exhibited from the late April to early May as 364 simulated via SWAT (i.e., black curve in Figure 5). Further examination of Figure 5  The mean monthly DOC concentration for croplands were more than 3-fold greater ( Figure 4A) 374 compared with those for forestlands ( Figure 4B) throughout the year. The monthly precipitation and 375 temperature patterns between the two study sites were very comparable. Therefore, the excessive soil 376 DOC production rates from croplands were caused by anthropogenic effects in terms of crop residue 377 accumulation and related break-down. Our previous study's preliminary mesocosm experiments 378 revealed that corn plants have a slower metabolic process for transforming foliar organic matter into the 379 dissolved form when compared to deciduous (D) and evergreen (E) foliage(Li et al. 2018). Figure 6  380 illustrates how DOC production rate changed for the same applied biomass (260-g dry biomass), the 381 same number of incubation days, and two different temperatures (L:20 °C or H:25°C) for these three 382 vegetation types. Figure 6 illustrates that DOC transformation (i.e., foliar to dissolved) rates were often 383 2.5 times faster for forest foliage as compared to sweet corn foliage. Intuitively, the amount of organic 384 biomass of croplands must therefore be 5-6 times higher than that of forested lands to result in the 3-fold 385 greater DOC concentrations. Clearly both farming and crop residue practices cause increased DOC 386 concentrations originating from croplands as opposed to climate or temperature variations. 387 388

Variables attribute to spatial and temporal DOC variations in croplands 389
Seven variables were identified as being statistically significant in driving DOC variation (eq. 5) in 390 reference to finer spatial and temporal scales. Three out of these seven variables were related to seasonal 391 effects that acted to quantify combined hydrologic and climatic properties: soil water content (sw in 392 mm), ground water volumes (gw in mm), and potential evapotranspiration (pet in mm). The remaining 393 four variables were related to the spatial characteristics of any given sub-basin (i.e., crp, frt, wetl and 394 dum). Note that these spatial variables are the same as those found to be significant when analyzing 395 mean annual DOC dynamics. Specifically, for an area draining to sampling location i, and at a particular 396 Julian day, t, seasonal variable set S and spatial variable set X are: 397 398 frt(i), wetl(i), and dum(i)] for all i = 4, 5, 6, 7, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20 399 for all t = Julian day when had a field sampling