Study area
The Southeastern region of Brazil (SEB) includes Espírito Santo, Minas Gerais, São Paulo, and Rio de Janeiro states, and is one of the most developed regions in Brazil. The topography of the SEB region is complex, with extensive valleys (Paraíba do Sul, Jequitinhonha, and São Francisco do Sul), and mountains (e.g., Serra do Mar, and Mantiqueira), and altitudes ranging from 0 to 2800 m (Santos et al. 2022). The land use and land cover are mainly for agriculture (63%), forests (29%), non-forest natural formations (4%), non-vegetated areas (2%), and water (1%) (Souza et al. 2020). The study area either completely or partially includes all the states in the SEB, Brazil, specifically between longitudes 40°W and 47°W, and latitudes 20°S and 24°S. The area encompasses the metropolitan and northern coastal regions of São Paulo, the south and Zona da Mata regions of Minas Gerais, the metropolitan region of Vitória in Espírito Santo, and Rio de Janeiro (Fig. 1).
The Rio de Janeiro state extends between parallels 20.5 and 23.5° S (approximately 300 km long North to South), and meridians 41 and 45° W (approximately 400 km long West to East) (Silva and Dereczynski 2014). It contains 92 municipalities, with an area of 43,750,426 km2, and an estimated population is 17,366,189 (IBGE 2020). The state has metropolitan regions with high population densities, which are forested, used for agriculture, or covered by Atlantic Forest (Souza et al. 2020). Rio de Janeiro is bordered on the Northeast by Espírito Santo, on the North and Northwest by Minas Gerais, on the Southwest by São Paulo, and on the South and East by the Atlantic Ocean. The coastline is approximately 635 km long, and the topography is characterized by extensive coastal floodplains and altiplanos to the west (Brito et al. 2017).
The São Paulo state region covered in this study includes the mountainous regions of the Mantiqueira (Campos do Jordão and Taubaté), and the Serra da Cantareira (Guarulhos and Mirante de Santana), where remnants of Atlantic Forest can be seen (Oliveira et al. 2016). The Southern region of Minas Gerais comprises the Serra da Mantiqueira, areas with agricultural activities, and Atlantic Forest. The Zona da Mata borders Rio de Janeiro and Espírito Santo, and has considerable agricultural areas (Souza et al. 2020). Minas Gerais has an altimetric gradient varying between 30 and 1200 m. The metropolitan region of Espírito Santo is a tropical coastal area, with an estimated population of 1.8 million inhabitants, and is the main urban and industrial area of Espírito Santo (Brazil) (IBGE 2020; Andreão et al. 2020).
Observed data series
The observed monthly data series used to estimate the ETo using the PM–FAO56 method (Allen et al. 1998), and PET by the Thornthwaite (1948) method were obtained from conventional weather stations (CWS) belonging to the Brazilian National Meteorological Institute (Instituto Nacional de Meteorologia - INMET). Twenty-one stations across four states in the SEB were used i.e., ten stations in Rio de Janeiro, one in Espírito Santo, six in Minas Gerais, and four in São Paulo (Table 1). The twenty-one stations were selected based on the availability of continuous monthly data series and with few gaps in the period under analysis (see details below) and covering the Southeast region of Brazil.
Table 1
Location (Municipality, State, latitude – Lat., longitude – Long. and altitude – AL) of the weather stations of the Instituto Nacional de Meteorologia (INMET), World Meteorological Organization (WMO) code, identifier – ID, climate series (start, end, and length) and Thornthwaite climate classification.
|
|
|
|
|
|
|
Dataset
|
|
|
|
|
|
Lat.
|
Long.
|
Alt.
|
Start-End
|
Lenght
|
Thornthwaite climate
|
ID
|
OMM
|
Municipality
|
State
|
(°)
|
(°)
|
(m)
|
(year)
|
(year)
|
classification1
|
VIT
|
83648
|
Vitória
|
ES
|
-20,31
|
-40,31
|
36
|
1961–2005
|
44
|
C1dA'a'
|
PAC
|
83037
|
Coronel Pacheco
|
MG
|
-21,55
|
-43,26
|
435
|
1966–2009
|
43
|
B2rB'4a'
|
CAP
|
83639
|
Caparaó
|
MG
|
-20,51
|
-41,90
|
843
|
1973–2010
|
37
|
B1rB'3a'
|
VIC
|
83642
|
Viçosa
|
MG
|
-20,76
|
-42,86
|
712
|
1961–2010
|
49
|
B1rB'3a'
|
BAR
|
83689
|
Barbacena
|
MG
|
-21,25
|
-43,76
|
1126
|
1961–2010
|
49
|
B3rB'2a'
|
JDF
|
83692
|
Juiz de Fora
|
MG
|
-21,76
|
-43,36
|
940
|
1961–2010
|
49
|
B3rB'3a'
|
SLR
|
83736
|
São Lourenço
|
MG
|
-22,1
|
-45,01
|
953
|
1961–2010
|
49
|
B3rB'3a'
|
ALF
|
83049
|
Paty do Alferes
|
RJ
|
-22,35
|
-43,41
|
507
|
1992–2010
|
18
|
C2rB'4a'
|
MAR
|
83089
|
Maricá
|
RJ
|
-22,91
|
-42,81
|
4
|
1993–2009
|
16
|
C1dA'a'
|
IGB
|
83114
|
Iguaba Grande
|
RJ
|
-22,83
|
-42,18
|
6
|
1992–2003
|
11
|
DdA'a'
|
ITA
|
83695
|
Itaperuna
|
RJ
|
-21,2
|
-41,90
|
124
|
1967–2010
|
43
|
C1d A'a'
|
SMM
|
83696
|
Santa Maria Madalena
|
RJ
|
-21,95
|
-42,00
|
620
|
1961–1979
|
18
|
B2rB'3a'
|
GOY
|
83698
|
Campos dos Goytacazes
|
RJ
|
-21,74
|
-41,33
|
11
|
1961–2000
|
39
|
C1d A'a'
|
COR
|
83718
|
Cordeiro
|
RJ
|
-22,02
|
-42,36
|
506
|
1971–2010
|
37
|
B1rB'4a'
|
RES
|
83738
|
Resende
|
RJ
|
-22,45
|
-44,44
|
440
|
1961–2010
|
49
|
B1rB'4a'
|
RIO
|
83743
|
Rio de Janeiro
|
RJ
|
-22,89
|
-43,18
|
11
|
1961–1983
|
22
|
C1d A'a'
|
ANG
|
83788
|
Angra dos Reis
|
RJ
|
-23,01
|
-44,31
|
3
|
1965–1983
|
18
|
B3rA'a'
|
GRU
|
83075
|
Guarulhos
|
SP
|
-23,43
|
-46,46
|
735
|
1986–2010
|
24
|
B2rB'4a'
|
JOR
|
83714
|
Campos do Jordão
|
SP
|
-22,75
|
-45,60
|
1642
|
1961–1995
|
34
|
ArB'2a'
|
MIR
|
83781
|
Mirante de Santana
|
SP
|
-23,5
|
-46,61
|
792
|
1962–2005
|
43
|
B2rB'3a'
|
TAU
|
83784
|
Taubaté
|
SP
|
-23,03
|
-45,55
|
577
|
1962–2005
|
43
|
B1rB'4a'
|
1 – Thornthwaite’s climate classification (1948): A – Perhumid; B1, B2 and B3 - Humid; C1- Dry Subhumid; C2 - Subhumid; D - Semiarid; r – without or with low water deficit in winter; d – without or with low surplus in summer; A' - megathermal/tropical; B'2, B'3 and B'4- mesothermal; a' - less than 48% of the potential annual evapotranspiration observed in the summer. |
14 CWS (66.7%) had monthly data series covering more than 30 years. The maximum coverage was 49 years (Viçosa, Barbacena, São Lourenço, Juiz de Fora, and Resende). Two stations (9.5%), had data spanning 20 to 30 years, and four stations (19%), had data spanning 15 to 20 years, while only one station had data spanning 11 years (Iguaba Grande) (Santos et al. 2018a). All monthly data series were inserted in a period from 1961 to 2010. Stations in Minas Gerais had longer data series and fewer gaps. By contrast, stations in Rio de Janeiro had fewer data series and had more gaps. We also decided to use some data series from the CWS with less than 30 years to provide greater spatial representation, since the region has complex reliefs and is close to the coast (Santos et al. 2018b).
The following variables were used: sunshine duration (n, hours), air temperature extremes (maximum – Tx and minimum – Tn, °C), relative humidity (RH, %), measured at 1.5 m above the surface, and wind speed (u10, m s− 1) measured at 10 m. The CWS data was first submitted to quality control (assessment of inconsistencies), using validation techniques, and errors identification, according to recommendations from the World Meteorological Organization (WMO), technical report No. 8 (WMO 2014) and Annex 5, from FAO56 (Allen et al. 1998).
Gridded climate datasets
The Thornthwaite method for estimating PET with GCD uses monthly air temperature data from the Global Historical Climatology Network (GHCN) (available at https://psl.noaa.gov/data/gridded/data.ghcngridded.html), and University Delaware data (available at https://www.esrl.noaa.gov/psd/data/gridded/data.UDel_AirT_Precip.html#detail).
A monthly database, generated by combining two large individual observation datasets collected from the Global Historical Climatology Network (version 2), and the Climate Anomaly Monitoring System (GHCN + CAMS), was used for the GHCN product. This dataset differs since it combines two large individual datasets on surface meteorological observations, and since it employs unique interpolation methods (Fan and den Dool 2008).
The GHCN + CAMS has a set of monthly air temperature data, available from 1948 to 2010, which is flawless and homogeneously distributed in space (Fan and den Dool 2008). These are important characteristics for calculating PET estimations. GHCN data were interpolated to a global spatial resolution at 0.5 x 0.5° (Willmott and Matsuura 2001). The interpolation algorithm was the distance-weight method version of Shepard (1968), with grid nodes centered at 0.25°. The number of stations influencing each grid, on average, was 20 resulting in more realistic air temperature data (Willmott and Matsuura 2001).
The UDel data comprises the GHCN observation database (version 2), together with the Legates and Willmott (1990a, b) databases. The base monthly air temperature averages for the UDel were interpolated to 0.5 x 0.5° latitude and longitude, with nodes centered at 0.25°, similar to the GHCN, which is also based on the distance-weight method version of Shepard (1968), (Matsuura and Willmott 2018).
Monthly air temperature data series obtained from the gridded products were extracted so the grid points were as close as possible to the INMET CWS. The extraction of the series in the grid in NetCDF format was carried out with the aid of the MeteoInfo Software (Wang 2014). We obtained as many grid points as stations, i.e., 21 in total. We chose only observations with dates (month/year) that coincided with dates in the INMET data, so each grid point would have the amount of data as the closest INMET station.
Reference evapotranspiration (ETo)
The Bulletin No. 56 on irrigation and drainage from the Food and Agriculture Organization (FAO-56), (Allen et al. 1998), proposes using a standard Penman-Monteith method that is parametrized for hypothetical vegetation surface (grass) for estimating evapotranspiration:
\({\text{ET}}_{\text{o}}\text{= }\frac{\text{0.408 ∆}\left({R}_{n}\text{-G}\right)\text{+}\text{γ}\frac{\text{900}}{\text{T+273}}{\text{u}}_{\text{2}}\left({\text{e}}_{\text{s}}\text{-}{\text{e}}_{\text{a}}\right)}{\text{∆+}\text{γ}\left(\text{1+0.34}{\text{u}}_{\text{2}}\right)}\)
(1)
where ETo is the reference evapotranspiration (mm d− 1); Rn is the net radiation (MJ m− 2 d− 1); G is the soil heat flux (MJ m− 2 d− 1), which is zero on a daily scale (Allen et al. 1998); T is the daily mean air temperature [T = (Tx + Tn)/2] at 2 m (°C); u2 is the wind speed at 2 m (m s− 1); es is the daily mean saturation water vapor pressure (kPa °C− 1); ea is the daily average actual water vapor pressure (kPa); Δ is the slope of the saturation vapor pressure-temperature using mean air temperature (kPa °C− 1); and γ is the psychrometric coefficient (kPa °C− 1). These parameters were calculated according to the recommendations described in FAO bulletin nº 56 using monthly average daily sunshine duration (n, hours), air temperature extremes (maximum – Tx and minimum – Tn, °C), relative humidity (RH, %), and wind speed (u2, m s− 1). The psychrometric coefficient was estimated from atmospheric pressure calculated using altitude (z, m) of the CWS (Allen et al. 1998).
To adjust wind speed measured at 10 m to the standard height (2 m) we using a logarithm wind speed profile:
\({\text{u}}_{\text{2}}\text{= }{\text{u}}_{\text{10}}\frac{4.87}{\text{l}\text{n}(67.8 z-5.42)}\)
(2)
where z is the height of measurement above the ground surface (10 m).
Potential Evapotranspiration (PET)
Potential evapotranspiration (PET, mm) is defined by Thornthwaite (1948), as thermal efficiency index and air temperature, which is the only variable used in the PET method. Thornthwaite's method first estimates PET under standard conditions (PETs), i.e., 12 hours of duration of daylight, and for a month with 30 days, is as follows (Tanguy et al. 2018; Liu and Liu 2019):
\({PET}_{s}\text{ =}{\text{16}\left(\text{10}\frac{{T}_{i}}{\text{I}}\right)}^{\text{a}}\)
(3)
where Ti is the mean monthly air temperature (ºC); which is given by:
\(\text{a = 6.75}{\text{ 10}}^{\text{-7}}{\text{I}}^{\text{3}} \text{- 7.71 }{\text{10}}^{\text{-5}}\text{ }{I}^{2}\text{ + - 1.7921 }{\text{10}}^{\text{-2}}\text{ }\text{I}\text{+ 0.4923}9\)
(4)
where I is the heat index of the region, which is calculated using the following equation:
\(\text{I =}{\sum }_{\text{i=1}}^{\text{12}}{\left(\text{0.2}{T}_{i}\right)}^{\text{1.514}}\)
(5)
The I index changes for each region, and must be calculated with climatological mean air temperatures (Tanguy et al. 2018; Liu and Liu 2019), i.e., Ti is the average of all studied years, calculated for each month.
To account for variable day and month lengths PETs must be adjusted to:
\(\text{PET = PETs}\frac{\text{N}}{\text{12}}\frac{\text{ND}}{\text{30}}\)
(6)
where ND corresponds to the number of days in a month where PET (corrected potential evapotranspiration), is calculated (Tanguy et al. 2018; Liu and Liu 2019) and N (h) is the duration of daylight (in hours) on the fifteenth of the month given via:
\(\text{N = }\frac{\text{24}}{\text{π}}{\text{ω}}_{\text{s}}\)
(7)
where ωs is the hourly angle between sunrise and sunset (radians) given by:
\({\text{ω}}_{\text{s}}\text{= }\text{arccos}\left[\text{-tan}\left(\text{φ}\right)\text{tan}\left(\text{δ}\right)\right]\)
(8)
where φ is the local latitude (radians); and δ = solar declination (radians).Thornthwaite method estimates monthly total PET (mm month-1), to determined monthly average daily PET (mm d-1), the monthly total was divided by de ND. Although the Thornthwaite method estimates PET, the monthly average daily PET, it numerically approaches ETo (estimated by the PM-FAO56 method), making comparisons possible.
Statistical analysis
The precision and accuracy of the PET estimations using GCD (GHCN and UDel data) were assessed using the following descriptive statistical, and statistical indices: simple linear regressions with an intercept (β0), and slope (β1), coefficient of determination (r2), Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Square Error percentage (MSE%), Willmott’s agreement index (dw) (Willmott 1981), and the systematic (MESs) and non-systematic (MSEn) errors from MSE, as described below.
Simple linear regression describes how variable Y is related to variable X:
\({Y}_{i}\text{ = }{\text{β}}_{\text{0}}\text{+}{\text{β}}_{\text{1}}{X}_{i}\)
(9)
where Y is the PET fed by GHCN or UDel data; X is the PET or ETo obtained by CWS data from INMET; β0 is the the Y-intercept regression line; β1 is the slope of simple regression between Y and X; and subscript i is the i-th observation. β0 should be as close to zero as possible, since this is the Y-intercept, and β1 should be as close to 1 as possible.
The coefficient of determination (r2) (Eq. 10) measures the intensity relationship between observed and predicted values (Althoff et al. 2020). r2 values closest to 1, indicates a better fit, and is thus a highly reliable relationship between observed and predicted values.
\({r}^{2}={\left[\frac{{\sum }_{i=1}^{n}\left({X}_{i}-\stackrel{-}{X}\right)\left({Y}_{i}-\stackrel{-}{Y}\right)}{\sqrt{{\sum }_{i=1}^{n}{\left({X}_{i}-\stackrel{-}{X}\right)}^{2}+{\sum }_{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}}\right]}^{2}\)
(10)
where \(\stackrel{-}{X}\) is the PET or ETo mean with observed data and \(\stackrel{-}{Y}\) is the PET mean with gridded climate dataset (GHCN or UDel); and n is the number of observations.The RMSE measures the goodness of fit, increasing from zero, for perfect predictions, to positive values. Higher RMSE values mean higher discrepancy between the observed and the GCD used to feed the PET (Wilks 2006; Althoff et al. 2020).
\(RMSE=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({Y}_{i}-{X}_{i}\right)}^{2}}{n}}\)
(11)
The coefficient of determination (r2) consistently describes proportional increases or decreases of two related variables. However, it does not distinguish between types or magnitudes of possible covariances very well (Willmott 1981). To deal with certain problems related to r2, Willmott (1981), proposed an index of agreement (dw), expressed in:
\({d}_{w}\text{ = 1-}\left[\frac{\sum {\left({\text{Y}}_{\text{i}}\text{-}{\text{X}}_{\text{i}}\right)}^{\text{2}}}{\sum {\left(\left|{Y}_{i}\text{-}\stackrel{-}{X}\right|\text{+}\left|{X}_{i}\text{-}\stackrel{-}{X}\right|\right)}^{\text{2}}}\right]\)
(12)
The Willmott index of agreement reflects the degree to which observed variables are estimated using simulated variables (Willmott 1981). In this study, the first (Xi), is the PET or ETo estimated using observed data from INMET CWS, while the second (Yi), is the PET estimated using GHCN or UDel gridded data. The dw values range from 0 (total disagreement) to 1 (total agreement).
Beyond the statistics (and indices) aforementioned, it is important to quantify the systematic, and non-systematic (random) errors from MSE values (Willmott 1981).Willmott (1981), defines systematic error as:
\({MSE}_{s}\text{ = }\frac{\text{1}}{\text{n}}\sum _{\text{i=1}}^{\text{n}}{({\widehat{Y}}_{i}-{X}_{i})}^{2}\)
(13)
and non-systematic error as:
\({MSE}_{n}\text{ = }\frac{\text{1}}{\text{n}}\text{ }\sum _{\text{i=1}}^{\text{n}}{{(Y}_{i}-{\widehat{Y}}_{i})}^{2}\text{ }\)
(14)
where: \(\widehat{Y}\) is PET estimated from Eq. 8.
According to Willmott (1981), the systematic difference must approach zero if a simulation is to be considered satisfactory, while the non-systematic difference must approach the MSE. The MSE was calculated from MSE = MSEu + MSEn.
Significance testing of the Spearman rank correlation coefficient (Zar 1972) was performed at a 5% confidence level, between statistical and error indices (r2, MSE, dw, MSEs and MSEn), climatic indices from Thornthwaite e Mather (1955) (Iu, Ih and Ia), and geographic variables such as latitude, longitude, altitude, distance from the station to the grid point (DGP), and difference in height (DH), obtained for each PET and ETo estimation method. The Spearman correlation was calculated by comparing: i) the Thornthwaite method (1948) feed with gridded climate data from the GHCN and UDel against the same PET method feed with observed data from CWS (ThWGHCN vs ThWObs and ThWUDel vs ThWObs); and ii) the Thornthwaite method (1948) fed with GHCN or UDel data and the PM-FAO56 method (ThWGHCN vs PM-FAO56 and ThWUDel vs PM-FAO56). The results were shown using a graphical display of a correlation matrix to identify whether the highest/smallest errors and lowest/highest accuracy and precision indices were related to geographic and/or climatological factors.