2.1 Study area
In the Andean basins are generated the largest rivers in South America, in Colombia the main rivers and their tributaries are born in the Andean region (Restrepo & Syvitski, 2006), much of the hydrology of the Andes is made up of lakes and marshes of diverse origin and characteristics. It is estimated that the number of bodies of water with a surface area greater than 100 square meters in the Colombian mountain ranges is approximately 1,500 (Diaz Merlano, 2018).
Figure 1 shows the Coello river watershed, located on the eastern slope of the central mountain range in the Department of Tolima. The main riverbed originates from the Nevado del Tolima, a volcanic mountain with an altitude of 5200 m.a.s.l. It is one of the most economically and socially important tributaries in the Department of Tolima, since it is a source of drinking water for more than 600,000 inhabitants. The Coello River runs a length of 124.7 km from source to mouth, draining an area of 1842 km2. The main economic activity in the basin is agriculture, comprised mainly of corn, coffee, bananas, beans, cassava, vegetables, cotton, sugar cane, soybeans, rice and tobacco. There is also livestock, poultry and swine farming (CORTOLIMA, 2006). The Corporación Autónoma del Tolima - CORTOLIMA (environmental entity of the Department of Tolima) classified the vegetation cover of the Coello river basin through photographs, identifying semi-permanent and permanent crops, pastures, forests, natural shrub vegetation, areas of agricultural exploitation, and areas without agricultural or forestry activities.
2.2 Methods of land cover change models
The methods development for determining land use/land cover (LULC) change is depicted in Fig. 2.
2.3 Data and satellite images
Landsat images were obtained from the United States Geological Survey (USGS) for the years 2001, 2003, 2015 and 2019. Those years were selected because they clearly show the land use in the watershed. Two images were downloaded for each year to cover the entire study area. Table 1 specifies the characteristics of the images acquired for the study.
Table 1
Satellite information from USGS images for 2001, 2003, 2015 and 2019 for the Coello river watershed
Sensor | Date of acquisition | Resolution | Sensor | Date of acquisition | Resolution |
Landsat 7 ETM | 16/07/2001 | 30 m | Landsat 7 ETM | 18/04/2001 | 30 m |
Landsat 7 ETM | 01/11/2003 | 30 m | Landsat 7 ETM | 02/01/2003 | 30 m |
Landsat 8 OLI | 11/01/2015 | 30 m | Landsat 8 OLI | 22/12/2015 | 30 m |
Landsat 8 OLI | 03/09/2019 | 30 m | Landsat 8 OLI | 17/09/2019 | 30 m |
The images were imported to the ENVI software package (Exelis Visual Information Solutions) which applied a radiometric correction using the FLAASH method (Aguilar Arias et al., 2014). This method corrects geometric distortions based on the inclination of the sensor, the intervention of the relief and other systematic errors related to the quality of the image. This process is important for the accuracy and quality of the results. In our case, to identify land cover, the radiometric correction guarantees that the changes that are established correspond to real changes in vegetation cover and not to changes in the positioning of the images (Cabrera et al., 2014).
The Digital Elevation Model (DEM) from Shuttle Radar Topography Mission (SRTM) was obtained from the U.S. Geological Survey (USGS) with a spatial resolution of 30 m. The DEM also allowed for the determination of slope values and distances between roads in raster format, which was also later included in the land use and land cover modeling process.
2.3 Land cover classification
The land cover classification was carried out for the years 2001, 2003, 2015 and 2019 using a supervised classification methodology based on information from the sensor bands. Seven categories or classes of coverage in the watershed were defined, as shown in Table 2.
Table 2
Classification of seven types classes of vegetation cover
Type of coverage | Description |
Urban areas (UA) | Cities, towns and highways |
Water bodies (WB) | Rivers, lakes and lagoon |
Wooded areas (WA) | Areas of closed high canopy forest, tropical dry forest and urban forest |
Uncovered floor (UF) | Areas of bare soil, plowed land and excavation |
Clouds (C) | Areas totally covered by clouds |
Agricultural (AG) | Farming land |
Low vegetation (LV) | Clean grasses, shrubs, plantations, gardens, recreational areas within the city |
The classification method used was the supervised classification technique based on a Gaussian mixture model. This method focuses on an automated learning algorithm that memorizes the characteristics of an image (Sejati et al., 2019). Due to the different categories, the training zones or Regions of Interest (ROI) were delimited by layers of polygons, where their attributes were associated with the characteristics for each class. Different ROIs were generated for various combinations of bands, since they allow better visualization of the conditions on the image. Between 150 and 300 polygons were assigned for each year in order to represent the training areas for the classification methodology. This classification methodology consists of grouping together pixels that represent the same category (Mather & Tso, 2016), identifying types of coverage and distinguishing areas or zones according to the seven categories previously mentioned (Rojas Barbosa, 2019).
2.4 Modeling and predicting changes in vegetation cover
Three simulations were conducted for our study. For each, the image from 2001 was used as the starting image, while the images from 2003, 2015 and 2019 were considered the the most recent images, respectively. These were used evaluate the transition percentage between each one. To calculate the transitions, we used six of the seven categories proposed in Table 2, as the cloud category was omitted.
2.5 Change analysis
All of the possible coverage transitions for each year were analyzed. The module we used has a series of sections for the evaluation of gains, losses, persistence and transitions in the form of maps, graphs or quantitative data (Eastman, 1999). In order to identify the changes, the year 2001 was considered as the base year. The analysis therefore covered the 2001–2003, 2001–2015 and 2001–2019 periods.
2.6 Additional data
We considered the exploratory variables to be those that directly impact modifications in land cover and land use. These variables included topography features such as elevation and terrain slope that may favor or restrict urban expansion, types of land coverage and anthropogenic activities (Wang & Maduako, 2018). Proximity factors such as distance to roads may also influence urban sprawl given it provides convenient access to basic services for the people living nearby, a phenomenon known as the neighborhood effect. This means that when surrounded by built-up areas or roads, a pixel of a different category tends to eventually transform into that of an urban area (Ye et al., 2013). In our study, we selected variables that were expected to influence the change in land coverage (Dzieszko, 2014), such as elevation, slope, distance to roads and evidence likelihood of use.
2.7 Neural network model: Multi-layer Perceptron (MLP)
After the coverage types were defined, an artificial neural network was used to evaluate changes from to the interactions between the explanatory variables and the transitions (Shen et al., 2020). Four layers of coverage were used for the analyzed years and the changes between each of those years were evaluated. Then, as described earlier, the standard configuration was applied in which 50% of the pixels were used for MLP training and the remaining 50% were used for model validation and testing. This method is recommended and supported in the literature (Silva et al., 2020). The model calibration required 10,000 iterations since when observing the curve, the error decreased and the curve stabilized due to the increase of iterations.
2.8 Markov Model
To predict LULC changes in the study area, we integrated the CA-Markov method with the TerrSet software program (Rahnama, 2021, pp. 2016–2030). Markov is a stochastic model that is commonly used to simulate and predict land cover types (Kamusoko et al., 2009). It is based on the theory that future changes depend mainly on the current state and is applied to continuous and changing land surfaces (Mansour et al., 2020). For our study, the Markov model explained the transition states that occur between one category of land cover and another and the probability of those changes occurring (Sang et al., 2011). This process involved (i) generating the transition matrices from the land cover maps using 2001 as the base year, (ii) generating transition maps according to land cover type, (iii) calculating the Kappa index to determine the accuracy of the model and (iv) simulating land cover for the year 2050.
The transition matrix represents the pixels that change from one type of land cover to another over the number of times determined in the model. This probability is represented mathematically in the following equation.
$$P={P}_{ij}\left[{P}_{11} {P}_{12}\dots {P}_{1n} {P}_{21} {P}_{22}\dots {P}_{2n} {P}_{n1} {P}_{n2}\dots {P}_{nn} \right]$$
1
Where P is the Markov transition matrix, i and j are the cover types in the first and second years, Pij is the probability that cover type i changes to type j, and N is the number of classes in the study area (Mansour et al., 2020).
2.9 Validation of coverage prediction
To validate the predictive model, the Validate module was used and the Kappa method was applied. The Kappa method measures the goodness of fit between the initial map and the prediction map, unlike traditional statistics. The Kappa index formula is shown below:
Kappa = (Po - Pe) / (1 - Pe) (2)
where
Po = (a + d) / N
Pe = [(a + b) * (a + c) + (c + d) * (b + d)] / N^2
N = a + b + c + d
a = number of true hits (correctly classified as belonging to the class of interest)
b = number of false positives (incorrectly classified as belonging to the class of interest)
c = number of false negatives (incorrectly classified as not belonging to the class of interest)
d = number of true hits (correctly classified as not belonging to the class of interest)
The Validate module divides the index into several components, where each one expresses a special form of Kappa (Araya & Cabral, 2010; Chowdhury et al., 2021). Kappa values below 0.40 are categorized as poor, values between 0.40 and 0.75 are categorized as good and values above 0.75 are considered excellent (Roy et al., 2014). Similarly, ROC is represented by a graph between true values (y-axis) and false positive values (x-axis). The index ranges between 0 and 1, where 1 is a perfect fit and 0.5 is a random relationship between both maps. Values lower than 0.5 indicate an incorrect model (Camacho Olmedo et al., 2015).
2.10 Compilation of information for hydrologic simulation and hydroclimatic data
Hydro-BID is a hydrological process software developed by the Inter-American Development Bank (IDB) (Yáñez San Francisco et al., 2023), the software uses the Analytical Hydrology Data for Latin America and the Caribbean regions. Hydro-BID which allows the evaluation of water quantity and quality in watersheds. The rainfall-runoff model employs the Watershed Loadong Function (GWLF) model. The model requires as input data precipitation in centimeters (cm), temperature in degrees Celcius (ºC), time series of flows in cubic meters per second (m3/s).
These data were acquired through the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM-Colombia), from 17 pluviometric stations, 23 standard climatological stations and 3 limnimetric and liminigraphic stations. The period of data analyzed and used in the model was from January 1, 1981, to December 31, 2016(35-year period).
2.11 Hydro-BID Parameterization
Changes in land use can impact runoff and therefore the flows in a watershed. Hydro-BID, through the Customized Parameterization Tool, allows data on land use to be entered in order to parameterize the different key variables established in the software and thus calculate daily runoff (Mena et al., 2019). Data on soil texture characteristics were used after being extracted from the global database of soils from the Food and Agriculture Organization of the United Nations (FAO). The data were organized into the different layers and CSV files in order to calculated the available water capacity and curve number.
2.12 Calibration of the hydrological model
The hydrological model was calibrated manually by trial and error in two stages. In the first stage, the upstream zone of the basin, more specifically the sub-basin with identifier 301488200, was considered. The second stage considered the downstream zone with sub-basin 301527100. The observed flow data were collected from the Puente Carretera (21217120) and Payandé (21217070) boundary stations, respectively.
2.13 Simulated scenarios
Seven scenarios were chosen for the simulation: one which presents the current conditions, the second which represents the incidence of land use change, three scenarios that show the effect of climate change (Representative Concentration Pathway; RCP 2.6, 4.5 and 8.5) and three other scenarios that mix land use change and climate change.
-
Baseline: Initial watershed conditions
-
Scenario 1: Baseline conditions + land use/land cover change
-
Scenario 2: Baseline conditions + RCP 2.6
-
Scenario 3: Baseline conditions + RCP 4.5
-
Scenario 4: Baseline conditions conditions + RCP 8.5
-
Scenario 5: Change in land use/land cover + RCP 2.6
-
Scenario 6: Change in land use/land cover + RCP 4.5
-
Scenario 7: Change in land use/land cover + RC 8.5