The Emet-Orhaneli Basin consists of agricultural lands, forests, and ranges. However, there were differences in ratios of LULC classes among data sources (Fig. 2 and Fig. 3) to affect hydrological responses of the calibrated models. Accordingly, three different discussions are made based on the results: First, the performance metrics (NSE, R2, and KGE) were compared according to the observed streamflow data for the whole basin outlet. Second, mean monthly and annual ET, surface runoff, and water yield distributions were evaluated over the entire study period (1980–2012). Finally, spatial changes in surface runoff values were observed at the subbasin-scale.
In agreement with previous studies [19, 21–23], the calibrated stream flow (water yield) results in this study showed minimal variations when employing different LULC models. This result was demonstrated in this study for six various LULC data, two of which had not been tested before (GLCNMO V1 and PELCOM). All datasets showed above acceptable calibration performance [39]. Results were evaluated with the NSE metric, which is a widely used metric in determining model performance. Accordingly, a slightly better calibration result according to NSE was obtained with GlobCover 2005 (0.66) and the worst performance was obtained with GLCC (0.61) data. Calibration results of other data varied between 0.64 and 0.65. In the validation period, however, the metric values decreased, and the GLCC data had an NSE value below 0.50. The poor performance of the GLCC data in terms of NSE value may be attributed to factors such as the aggregation of LULC classes and the low resolution of the dataset [17, 22]. However, the relatively better performance of other coarse resolution data suggests that resolution alone may not be the primary reason for the GLCC performance. In addition to the evaluation of performance metrics, hydrograph trends were found to be very close to each other. As an example, slight differences can be observed in the peak conditions of the 1981 and 2002 water years (Fig. 4b-c). The GLCC had a higher peak value than the other datasets, however, no particular trend differences were found throughout the entire study period. However, the use of low-resolution LULC data could be more desirable since it offers the advantage of minimizing processing and calibration efforts [21].
For annual and monthly means in the simulation period, ET and water yield showed almost similar trends for all LULC data, while the surface runoff values were relatively different. Moreover, the water yield values were above the observations (overestimation) in all datasets (Fig. 5c). In addition, the ET/PREC, SURQ/PREC, and WYLD/PREC ratios were in the order of 0.5, 0.1, and 0.2, respectively (Table 4). The SURQ/PREC ratio was more sensitive to LULC data compared to other ratios (ET/PREC and WYLD/PREC). This result indicates that different LULC data affect surface runoff more than ET and water yield. Figure 5b illustrates that GLCC (87% agricultural) had the highest surface runoff values, while PELCOM data (70% brushes) had the lowest values. The surface runoff values of GLCC data are extreme compared to the others. This data stands out from the others due to their underrepresentation of forested areas within the basin.
While the monthly and annual average values of the simulations allowed us to interpret the overall results for the entire basin, visualizations were made to examine the changes in the subbasin-scale. Similar to [23], spatial differences in surface runoff values were more striking compared to water yield values. For this purpose, in Fig. 6, surface runoff variations were presented at the subbasin-scale. Figure 6 revealed that the GLCC data exhibits a higher surface runoff variation than the other datasets. This difference in the GLCC data can be caused by the distribution of the highly sensitive CN2 parameter in the basin, which varies spatially on the basis of HRU. When GLCC data was used in the SWAT model, high CN values and therefore surface runoff values were observed because agriculture dominant HRUs were produced. On the other hand, low values were observed in PELCOM data because HRUs with high brush class were produced. In Fig. 7, the differences of the data from the mean values at the subbasin scale indicate that GLCC produces higher surface runoff values and PELCOM produces lower values. Hence, it was unveiled in which sub-basins the surface runoffs deviated from the averages, indicating either higher or lower values. The GLCC model subbasins have higher than average values (differences of more than 30 mm), while the PELCOM has low (differences of less than 20 mm) extreme values. In the subbasins of GlobCover 2005, there are no variations exceeding 20 mm from the average values, however, there are values within the range of 10 to 20 mm that fall below the average. The subbasins of the other three data (GLC 2000, GLCNMO V1 and CLC 1990) differ in the range of ± 10 mm and are closer to the mean values. Also, in Fig. 8, GLCC appears to have larger variability than the other five data. Besides the difference in mean value (red dots) and median, box and whisker length indicate a wide range in 43 subbasin surface runoff values for GLCC.