Urban Heat Island and Electrical Load Estimation Using Machine Learning in Metropolitan Area of Rio de Janeiro

We developed a daily electrical load forecasting model for the State of Rio de Janeiro and a monthly model for each Light concessionaire substation in the Metropolitan Area of Rio de Janeiro (MARJ). The data used are 1) daily National System Operator (ONS) electrical load data respecting to State of Rio de Janeiro for four years (2017–2020); 2) the monthly electrical load of 84 Light substations for 11 years (2010–2020); 3) the maximum, minimum, and mean air temperature. In addition, remotely sensed land-surface temperature (LST) based on Landsat data from 1984 to 2020 is used to restructure the current meteorological network on MARJ based on the disposition of the Light substations. Using cross-validation, we performed 500 daily and 500 monthly training-testing experiments of �ve regressive machine learning-based algorithms. Results for daily-ONS and monthly-Light loads show average correlations (hindcast in parentheses) of the �tted models of 0.85 ± 0.09 (0.83 ± 0.07) and 0.89 ± 0.05 (0 .91 ± 0.06), respectively. The model's Mean Average of Error (MAE) values correspond to a percentage error of about 4.03% (daily) and 4.83% (monthly). According to the monthly electrical load behavior revealed, when the temperature changes from 23 to 26 ℃ at MARJ, it rises roughly from 1.92x10(cid:0) ± 67227.4 kWh to 2.70x10(cid:0) ± 90198.5 kWh. We performed a cluster analysis based on the locations of 1) the 18 meteorological stations currently installed, 2) the 84 Light electrical load substations, and 3) the urban heat island cores. Results reveal seven locations where new meteorological stations should be installed to model the electrical load with higher spatial resolution in MARJ.


Introduction
Energy is one of the most vital resources in daily life, and electricity is essential for human activities and progress.Intrinsically, it serves as an indicator of a Nation's economic health.Nonetheless, electric energy is critical to the economies of the countries (Jasiński 2019; Hutasavi and Chen 2021).Forecasting electric energy load is critical for the proper operation of the electric network, which is subject to a set of stringent requirements in the electric power sector.The predictions are also crucial for businesses to make decisions about the production and procurement of energy in the future market, which could consequently have an impact on infrastructure development, operating costs, and energy e ciency (del Real et al. 2020).
Forecasting the demand for electric energy is very challenging, since it depends on several parameters that interact in a complex way, including socioeconomic, corporate, and personal factors.Energy demand may be related to factors such as population growth, demographic factors, income, the number of electrical appliances, the working day, and the time of the day because these interactions occur at various spatial and temporal scales.Forecast models are presently developed based on the time horizon (short, medium, and long term), temporal resolution (hourly, daily, monthly), and spatial resolution (hourly, daily, monthly) (e.g., residence, regional, country level) (Verwiebe et al. 2021).
Since variations in demand are consumers' reactions to weather conditions at a particular time, customers' responses to meteorological variables, particularly air temperature, have a signi cant in uence on energy consumption patterns.Electricity demand uctuations in tropical countries are generally related to the use of cooling equipment.Temperature is strongly related to electricity consumption, becoming an essential meteorological parameter for the behavior of electrical energy demand because approximately 99% of the energy used for space cooling is electrical (Azevedo et al.

2016; Zhang et al. 2021).
Understanding electricity demand as a function of temperature, it is important not only for forecasting that takes seasonal temperature variations into account, but also for assessing the impact of climate change on energy systems.The effect of rising temperatures on electricity demand extends also to the useful life of concessionaires' equipment, including conservation and replacement planning.This understanding is di cult to grasp because there are feedback processes at play, such as those at the global level where climate change affects the need for electricity for cooling while also having the potential to exacerbate warming by increasing greenhouse gas emissions during the process of producing electricity.On a smaller scale, the urban climate is a complex and distinctive system that is unique to the city, where there are signi cant changes in temperature, atmospheric circulation, albedo, heat storage, evapotranspiration, and other aspects of the energy balance at the surface.The main manifestation of urban climate, one of the major environmental issues of the twenty-rst century, is the urban heat island (UHI), which is characterized by higher temperatures in various urban spaces when compared to non-urban (rural) spaces (Rizwan et al. 2008), due to factors such as atmospheric pollution and ooding.
The UHI is distinguished by three major characteristics: the shape, intensity, and location of its hottest core.These aspects vary depending on the time of day, the season of the year, weather, geographic location, natural morphology such as hills, water bodies, and green areas, and the thermal properties of the materials that make up the city (Voogt 2002).The UHI has been studied using in situ measurements and remote sensing data, sometimes together, sometimes separately, taking advantage of remote sensing's high spatial resolution and in situ measurements' high temporal resolution (Rizwan et al. 2008;Grimmond 2006;Stewart 2011).Several studies have demonstrated the impact and effects of UHI on rising energy consumption in cities (Hwang et

2017).
Light is an electric energy concessionaire that provides 4 million customers across 31 municipalities in the Metropolitan Area of Rio de Janeiro (MARJ).The distribution of electricity in the MARJ is di cult due to high temperatures, which exceed 30°C during the spring and summer months, and systematic energy fraud, which causes signi cant commercial losses to Light.According to Peres et al. (2018), the variation in UHI intensity in the MARJ is 4.4°C and 7.1°C, respectively, between urban and rural/lowdensity urban areas and between urban and vegetation areas.In addition, an increase in temperature over time is observed, respectively 1.9°C and 0.9°C, in the two land cover classes in which its customers are concentrated ("urban" and "rural/low urban density").
The study of the relationship between electricity demand and air temperature is critical for e cient energy management by electric energy concessionaires.To properly establish this mentioned relationship, evidencing critical temperature thresholds and including the impacts of UHI, global warming, and extreme weather events caused by climate change (e.g., heat waves), adequate spatial and temporal characterization of the temperature within the study area is required.
As a result, the current work has two main goals: 1. Examine how the current network of weather stations is spatially distributed and determine, using cluster analysis, the ideal number of stations to re ect the thermographic behavior of the current UHIs in the Light's concession area (the MARJ), and 2. Develop a temperature-dependent load prediction model and validated it for a case study.
Information regarding the LST on the MARJ, which is derived from remote sensing data, speci cally from the Landsat series of satellites, is helpful for both objectives.The intensity of the UHI in the MARJ, as calculated by the LST, as well as the locations of pre-existing meteorological stations over the MARJ, are used to de ne the spatial distribution of the in-situ air temperature monitoring network.

Study area
MARJ is one of the eight governmental regions of the state of Rio de Janeiro, located in southeastern Brazil and bordered by the Atlantic Ocean on the west coast.The MARJ will be de ned by 19 municipalities for the purposes of this work, according to the regionalization of the State Center for Statistics, Research, and Training of the Public Servant of Rio de Janeiro, namely: Belford Roxo, Duque de Caxias, Guapimirim, Itaboraí, Itagua, Japeri, Magé, Maricá, Nilópolis, Niterói, Nova Iguaçu, Paracambi, Queimados (Figure 1a).This regionalization comes before the current one (with three additional municipalities), but it adheres to the spatial constraints of the Landsat scenario.

Landsat image pre-processing and LST estimation
The UHI data for the MARJ were collected using remote sensing information from the Landsat-5, -7, and -8 satellites.A total of 190 images from the United States Geological Survey were used, covering the years 1984 to 2020 (which can be obtained from: https://earthexplorer.usgs.gov/).To establish the LST on MARJ, the Landsat images performed the pre-processing procedures listed below: 1. georeferencing based on the Geocover 2003 database (Peres et al, 2018) and ENVI 4.7 geometric correction using the cubic convolution interpolation approach in cartographic projection (WGS 84).Landsat-5 and Landsat-7 thermal band 6 were resampled to the same spatial resolution as the other bands; was estimated using the cloud-free pixel that had been discovered.

UHI intensity and MARJ thermal eld monitoring network
A suitable spatial and temporal characterization of the MARJ thermal eld is required to demonstrate the relationship between power demand and air temperature.In this context, it is determined which regions of the MARJ (whose substation is the closest) where complementing with meteorological stations of the mentioned network is recommended based on the intensity and location of the UHIs, the geographical arrangement of the automatic weather stations (AWS), and the spatial representativeness of this network for collecting meteorological data.
The intensity of the UHI was calculated using the average LST composition from 1984 to 2020.This composition is also intended to observe, in absolute terms, the spatial distribution of the LST over the MARJ, allowing, in conjunction with the intensity of the UHI, to verify the regions with greater thermal discomfort caused by urbanization.UHI intensity was originally computed more generally based on percentiles of the distribution of mean LST values for the years 1984 to 2020.The calculation of percentiles excluded locations with heights more than or equal to 100 m.The three following intensity classi cations were used to determine the hotspots in the MARJ: Class 1 (hot) -p75 > LST ≤ p90; Class 2 (very hot) -p90 > LST ≤ p99; Class 3 (extremely hot) -LST > p99.
In addition to the percentile method, we ranked Rio de Janeiro 10 warmest neighborhoods based on LST composition over the course of the full study period .The idea behind categorizing the city hottest areas based on the LST composition is to validate and identify the locations that need more attention since they are more susceptible to heat stress and, as a result, use more energy.Finally, Rio de Janeiro was chosen as the location for this study of neighborhood rankings due to the energy concessionaire's substantial investment there, with a better understanding of the city's hottest areas serving as a crucial component.
To determine the optimal number of meteorological stations to supplement the current meteorological data collection network, proceed: 1. Substations are classi ed according to two quantitative variables, namely the energy load supplied and the distance from the closest weather station; 2. The variables are normalized, and the number of substation groups is determined by trial and error using the k-means algorithm; and 3.By examining the best con guration of the K -th substation clusters whose Light´s concession area can be properly monitored, the optimum number of AWS to complement the existing network of meteorological stations will be determined.

Model of electricity load estimative
Light has 84 substations, and only monthly load numbers (kWh) are kept track of as a result, the following approach is used using the temperature-dependent monthly load forecasting model: 1. the 84 substations of Light's monthly loads (LML) are summed, and this total (in kWh) will serve as the estimate (or output's) target; 2. connect the LML (output) with observational records (inputs) of the minimum, average, and maximum temperatures in the Light's concession area; 3. using the software WEKA software (version 3.9.6)(Hall et al, 2009), train several suitable regressive machine learning algorithms using cross-validation, which separates the entire input and output data set into n mutually exclusive subsets of the same size, with one subset used for testing and the remaining n-1 used for parameter estimation and determining the algorithm's accuracy; 4. in a similar manner, steps 1-3 are repeated using the National Electric System Operator (ONS) data on the State of Rio de Janeiro's total daily load (output), with the minimum, average, and maximum temperatures in the MARJ serving as inputs; 5. both the daily-ONS load for the State of Rio de Janeiro and the monthly-Light load forecasts are tested via hindcast with the separated dataset.

UHI in MARJ
Figures 1b and 1c depict the average LST and intensity of the UHI for 1984-2020.The maps' spatial thermal radiography shows a concentration of the highest LST values in shades of red and orange that covers a large portion of the Baixada da Guanabara and Fluminense, which includes neighborhoods in Rio de Janeiro's northern zone and the municipalities of Duque de Caxias, São João de Meriti, Nilópolis, Mesquita, Belford Roxo, and Nova Iguaçu.The concentration also extends to Rio de Janeiro's south and west ends, though the latter is more rare ed.Finally, these higher LST values extend to the lowlands of Guanabara Bay's east bank, which include the municipalities of Niterói, São Gonçalo, and Maricá.This concentration corresponds to Class 3 percentiles (p > 99) with the highest (extremely hot) LST values.
Many of the warmer regions that have been exposed up until this point are shielded by cooler pockets that lower the LST by 35° or even below 30°C, which means that they are mainly placed in Class 1 (hot) -p75 > LST ≤ p90 and subsequently in Class 2 (extremely hot) -p90 > LST ≤ p99.This fact occurs in settlements or areas encircled by huge massifs or small mountains.They increase the air's temperature and constrict it inside a small area's atmospheric circulation.The central area of Rio de Janeiro has the highest concentration of Class 3 (p > 99) values.
Despite the differences in absolute values, the results for both summer and winter over the entire period (found at https://github.com/aa-vinicius/sisprecarga)show the same spatial pattern of LST, allowing for the observation of extreme values of LST found in the RMRJ, which indicate, as would be expected, lower temperatures in winter and higher temperatures in summer.In some areas, there is a temperature difference of more than 10°C between summer and winter.The results of the annual LST pentads (found at https://github.com/aa-vinicius/sisprecarga)show that, although each one presents a different number of images, the distribution of images for each month is also unique and contains meteorological and climatic variability, so the evolution of LST should be analyzed with caution.However, it is evident that the highest values of LST occurred in the most recent ten years (2010-2015 and 2016-2020 pentads).
On the other hand, the division into pentads allows for the identi cation of surface changes, such as the construction of the petrochemical complex in the municipality of Itaboraí, which was observed between 2011 and 2015.The spatial average of LST for each neighborhood in Rio de Janeiro was calculated using the average LST for the entire period (1984-2020).Figure 1d depicts the ranking of the ten hottest neighborhoods from 1984 to 2020, namely Vasco da Gama, Jacaré, Cidade Nova, Bonsucesso, Higienópolis, Del Castilho, Praça da Bandeira, Vila da Penha, Ramos, and Ben ca, whose LST average ranged from 33.9 to 34.5ºC.

Restructuring of the AWS network in MARJ
The 84 substations (black circles in Fig. 2a) were divided into 2, 3, 4, 10, and 15 clusters using the Kmeans, considering their distances from each of the 18 pre-existing meteorological stations in order to determine the optimal number of AWS to complement the current meteorological station network at MARJ.It was found that 10 is the optimum cluster number, i.e., when each cluster's internal variation is the least and the intra-cluster variance is the biggest.Figure 2b depicts the arrangement of the 10 de ned clusters (centroids in white crosses) and the 18 meteorological stations (green triangles).Thus, by comparing the area of each of the 10 de ned clusters to the arrangement of the 18 pre-existing meteorological stations, it was determined that seven of the ten clusters must house a meteorological station in a complementary manner, which must be initially installed in the following substations: Cluster 1 -there is no necessity; Cluster 2 -"Cachambi substation"; Cluster 3 -there is no necessity; Cluster 4 -"Nova Iguaçu substation"; Cluster 5 -there is no necessity; Cluster 6 -"Pedro Ernesto substation"; Cluster 7 -"Brás de Pina substation"; Cluster 8 -"Cachamorra substation"; Cluster 9 -"Frei Caneca substation", and Cluster 10 -"Recreio substation".Figure 2c depicts the positioning of the 25 stations, 18 installed (green triangles) and seven to be installed (blue triangles), on the MARJ UHIs' in uencing areas.Finally, the 84 Light substations plus the restructured weather network in MARJ is shown in Fig. 2d. the effect of high ambient temperatures on power consumption and found that the peak electrical load increased by 0.45-4.6%for every degree of air temperature increase.In this study, the monthly-Light and daily-ONS loads summaries are shown in Table 1 for the time periods of October 2010 to December 2020 and January 1 to December 31, respectively.Figure 3 displays the relationship between the monthly average temperature records in the MARJ and the variation (blue bar, which represents the maximum, average, and minimum) of the total monthly load of the 104 substations.Light's average total load curve (black line) has two plateaus with minor variation, one between 19°C-23 C and the other above 26°C, with monthly average values of 1.92x10 ±67227.4kWh and 2.70x10 ±90198.5 kWh, respectively.The average light load rises between 23°C and 26°C (2.30x10 ±93113.0kWh), due to thermal discomfort and the use of air conditioners in the MARJ.

Electrical Load Model
In general, machine learning training and testing require a signi cant amount of time and depend on the de nition of the inputs; in this case, the behavior of electrical load and temperature, which have a high correlation for the Light and ONS databases.WEKA's regressive algorithms were evaluated and trained using cross-validation in following the steps outlined in Section 2.4.The outputs were the monthly-Light load (Steps 1-3) or the daily-ONS load, and the inputs were the maximum, minimum, and average temperatures for each dataset.Using the experimenter option in WEKA, two training-test experiments were carried out using ve different regressive algorithms (Table 2), including GaussianProcesses, LinearRegression, MultilayerPerceptron, SimpleLinearRegression, and AdditiveRegression, for monthly-Light (Experiment 1) and daily-ONS (Experiment 2) data.Each of Experiment 1 and 2 produced 500 trained algorithms (or models) after training ve algorithms with 10 runs.

Conclusions
It proposed to investigate I) the current meteorological data collection network and con gure the optimal network for a better understanding of the air-heat island temperature versus the Light electrical load in the MARJ, and II) a model for estimating the electrical load in the MARJ (84 substations) on a monthly basis and the State of Rio de Janeiro on a daily basis.The main ndings are as follows: The UHI con guration in MARJ is polynucleated, and the behavior of electrical loads and air temperature are strongly connected; When the air temperature is below 23°C, the electrical load in MARJ has an almost constant behavior; between 23 and 26°C, there is a signi cant increase; and above that, it goes up to partially full electrical load; Seven new AWS for collecting meteorological data should be installed in speci c locations in the MARJ; Machine-learning-based models developed can estimate monthly-Light and daily-ONS load with low average error and high correlation.
The plan for future work is to perform a hindcast using the models developed with temperature data generated by an atmospheric numerical model as input.

2 .
radiometric calibration, which entails converting digital numbers into radiance, where the bands of solar re ection were changed into re ection and the bands of thermal radiation into brightness temperature (BT); 3. cloud masking was founded on research by França and Cracknell (1995) and Chen et al. (2003) who employed three methods to determine whether or not pixel data was cloud contaminated.The rst is based on a red band re ectance threshold value that was determined to be 0.3 for the MARJ after multiple experiments.The second is a cloud-contaminated BT threshold that has been set at or below 278K; The third technique considers the value of the ratio between the re ectance values of the near-infrared and red bands, where values close to a value of 1 indicate pixels contaminated by clouds, less than 0.8 for water, and greater than 1.4 for vegetation; and 4. Based on prior work (Qin et al, 2001; Jiménez-Munoz and Sobrino, 2003; Lucena et al, 2012), the LST

Figure 3 Relationship
Figure 3 al. 2017; Huang and Gurney 2016; Li et al. 2019).UHI intensity has been used to calculate energy usage.It is based on land-surface temperature (LST) data and the quantity of nighttime illumination, both calculated from orbital remote sensing data (Liao et al.

Table 1
Summary of the electrical load data from Light and ONS.

Table 2
Algorithm hyperparameters.Light and daily-ONS loads using cross-validation and hindcast (for the 2018 dataset), respectively, based on the mean error mean (MAE), root mean square error (RMSE), and correlation statistics (Corr).The estimates replicate almost exactly the behavior of the monthly-Light and daily-ONS load records, as evidenced by the averages of the correlations of the training (hindcast in parentheses) of the best algorithms of 0.89 ± 0.05 (0.91 ± 0.06) and 0.85 ± 0.09 (0.83 ± 0.07), respectively.The 5-models average MAE values correspond to a percentage error of about 4,03% and 4,83%, respectively, over the daily-ONS and monthly-Light loads.

Table 3
Description of the results for the daily-ONS (a) and monthly-Light (b) data.(a)