2.1 Research Design
Previous studies (Auffhammer & Aroonruengsawat, 2011; Auffhammer & Mansur, 2014; Davis & Gertler, 2015; Deschênes & Greenstone, 2011; Fan, Zhang, & Wang, 2017; Frederiks, Stenner, & Hobman, 2015; Li et al., 2019) suggest that factors impacting REC include temperature, precipitation, income, population, and urbanization. Since Lanzhou and Lhasa are both cities, we considered information on urbanization incorporated in the population data. Thus, we built our model using temperature, precipitation, income, and population as predictor variables, and REC as the response variable. Unlike previous studies using data modeling, we adopted an algorithm modeling approach. With respect to the predictive accuracy as well as interpretability, we compared the results from our model with those from the previously used parameter-based models to verify and demonstrate the strength of our new modeling approach. We then tapped into the capacity ofour model to holistically estimate climate impact by projecting the impact of climate change on REC under RCP2.6, RCP4.5, and RCP8.5 scenarios.
2.2 Data
We based our estimates on 2011–2019 Lanzhou and 2014–2019 Lhasa panel data. Each dataset included monthly REC, monthly mean temperature, monthly precipitation, annual salary, and annual population. The the study period was mainly defined by the availability of monthly REC data for the two cities. We obtained Lanzhou REC data from the Gansu State Grid Company (GSGC) and Lhasa REC data from the Tibet State Grid Company (TSGC). Monthly mean temperature and monthly precipitation data were calculated using 2011–2019 Lanzhou and 2014–2019 Lhasa daily temperature and precipitation data from the China Meteorological Administration. Yearly salary and population data were provided by the National Bureau of Statistics of China.
For the prediction of future climate impact on REC in the two cities, we used future climate projection of China based on the regcm4.6 model under the RCP2.6, RCP4.5, and RCP8.5 scenarios (Lei & Xiaoduo, 2020). This regional climate model has yielded more accurate simulation results of present-day mean climatology over Northwest China than the Met Office Hadley Centre Earth System (HadGEM2-ES) global climate model (Pan, Zhang, & Huang, 2020). We extract climate projections for the two cities from the nearest available points in the data; i.e., at the coordinates of 36°N,103.5° E for Lanzhou and 29.5°N,91° E for Lhasa. This climate projection data is more reliable for Lanzhou, as it is located in Northwest China where the regcm4.6 model performs well. Lhasa, on the other hand, is located not only in Southwest China, but also in the Tibetan Plateau. While the Tibetan Plateau is found to be getting warmer and wetter (Yao et al., 2019), climate projections for the region are greatly complicated by the uplift of the Tibetan Plateau and there is not reliable near-term climate change projection yet(Hu & Zhou, 2021). Since Lanzhou is also projected to get warmer and have increased precipitation (Lei & Xiaoduo, 2020), we were able to obtain a qualitative reference for Lhasa from results in Lanzhou.
2.3 Methods
2.3.1 Algorithm modeling vs. data modeling
We adopted algorithm modeling rather than data modeling as used in previous studies. Modeling can be considered as a process associating predictor variables with the response variable; the main difference between data modeling and algorithm modeling is the way in which the association is built (Leo Breiman, 2001). Data modeling assumes that real-world data conform to a stochastic regression function that incorporates predictor variables, random noise, and parameters, and attempts to estimate the response variable using functions that are then validated using goodness-of-fit tests(Leo Breiman, 2001). Algorithm modeling, on the other hand, assumes that the association is unknown and attempts to identify an algorithm that can best predict the response variables using predictor variables, and the results are then validated using predictive accuracy (Leo Breiman, 2001).
Data models attempt to fit data gathered from real-world observations of various systems with unknown mechanisms into a parametric model selected by a statistician. This inevitably compromises accuracy as well as insight (Leo Breiman, 2001). Despite efforts to minimize parametric assumptions (Deschênes & Greenstone, 2011), the econometric models used in previous studies remain as parameter-based data models with similar limitations. Conversely, algorithm modeling is completely data-driven since it makes only one assumption, i.e., that the used data are obtained from an unknown multivariate distribution (Leo Breiman, 2001). The more accurate a model, the more reliable is the information we can derive regarding the underlying data mechanism. Since algorithmic models have generally been proven to make better predictions (Leo Breiman, 2001), we adopted algorithm modeling in our study.
Since algorithm modeling treats the association between predictor and response variables as unknown, there was no means of ensuring that the algorithm worked well with our data prior to testing. We derived a pool of seven commonly used candidate algorithms (Brownlee, 2016).
Three are basic algorithms: k-nearest neighbors (KNN), support vector regression (SVR), and classification and regression tree (CART). KNN was designed with the assumption that similar things are in close proximity to each other. It makes estimations by measuring the distances between an instance and all other instances in the dataset. It selects the specified number of instances (K) closest to the one concerned, then votes for the most frequent label (in the case of classification) or averages the labels (inthe case of regression)(Altman, 1992). SVR works by constructing a hyperplane or hyperplanes in a high-dimensional space for either classification or regression. The hyperplane with the longest distance from all the nearest training data points of any class (also known as the functional margin) is considered a good separation since the larger the margin, the lower the generalization error (Cortes & Vapnik, 1995). CART is also known as a decision tree, since it is a tree-like algorithm that is built through repeatedly splitting into two child nodes. It splits on a single input variable that improves the Gini index, a performance measure that calculates the probability of a specific input variable being classified incorrectly when selected randomly (L Breiman, Friedman, Olshen, & Stone, 2017). The structure of CART makes it (along with tree-based algorithms) innately immune to correlation among predictor variables (J. Friedman, Hastie, & Tibshirani, 2001).
The other four are ensemble algorithms. Ensemble is a machine learning method that combines several base algorithms into one in order to decrease variance and bias or to improve predictions. We used Adaptive Boosting (AB), Random Forest (RF), Extra Trees (ET), and Gradient Boosting Machine (GBM), i.e., all CART ensembles with different ensemble methods . RF and ET train their individual trees independently and average predictions across trees, an ensemble method known as bagging. RF and ET are different in how their individual trees split as well as in how they sample data. RF splits where the performance measure is best, whereas ET splits randomly. RF subsamples the data sample with replacement, or bootstrapping, while ET uses the original data sample (Geurts, Ernst, & Wehenkel, 2006; Ho, 1995). AB and GBM build one tree at a time, where each new tree corrects errors made by the previous tree, an ensemble method known as boosting. AB and GBM differ in how they identify and correct errors of previous trees. AB identifies errors using high-weight data points as each new tree up-weights observations that were misclassified in the previous tree, whereas GBM identifies errors using gradient of loss function computed in the previous trees, which reflects how well the previous trees performed. AB corrects errors by assigning its trees different weights in the final prediction based on their performance. In contrast, GBM weights each tree equally, but restricts their predictive capacity through the learning rate, which represents how quickly an error is corrected from each tree to the next, for greater accuracy (J. H. Friedman, 2001; Kégl, 2013; Mason, Baxter, Bartlett, & Frean, 1999).
We determined the baseline performance of the seven algorithms by running our data with each algorithm using thedefault hyperparameters of the algorithms in a Python 3.7 environment employing scikit-learn implementation (Pedregosa et al., 2011). We used the scikit-learn utility of Pipeline (Pedregosa et al., 2011) to automate workflow and avoid data leakage. In the workflow, we standardized the data to test all candidate algorithms. We used 80% of the data for algorithm testing and modeling and reserved 20% for validation. The training dataset and the reserved validation dataset were split randomly. We used 10-fold cross validation to estimate algorithmic performance as measured by the mean squared error (MSE).
The stochastic nature of machine learning algorithms means that model results for each run can be different. To ensure we selected the best algorithm from the candidate pool, we ran the seven candidate algorithms 100 times each on our data and averaged their performance in terms of MSE and averaged ranking out of the 100 runs.
For reference, we also used a simple multivariate parameter-based ordinary least squares (OSL) model on the data in parallel with our algorithm model as follows:

where i is the sample month, RECi is residential electricity consumption for month i, Tempi is mean temperature of month i, Precii and is the total precipitation of month i. Salaryi and are salary and Popi population levels for month i, respectively, both averaged from their yearly total. The coefficients of interest are β0, β1, β2, and β3, representing the impact of temperature, precipitation, income and population, respectively. represents the bias. We did not regroup the temperature or the precipitation data in this parallel model in order to ensure that no more assumptions were made for this model than for the algorithmic model. We tested this model with the machine learning algorithms for baseline performance.
2.3.2 Comparing predictive accuracy
Figure 1 shows the baseline performance of the candidate algorithms using the Lanzhou and Lhasa data, respectively, with the blue bars indicating the mean MSE and the pink bars showing the averagedranking of the algorithmout of the 100 runs. It was evidentthat the ensemble algorithm GBM was the best fit for studying 2011–2019 Lanzhou data and 2014–2019Lhasa data, as it showed the lowest average MSEs across the board with high consistency.
To validate the model results in case of over fitting, we ran predictor variables from the validation datasets with the prepared models to make predictions. The validation results are shown in Figure 2. The good validation results suggested that the model was not overfitted.
2.3.3 Comparing interpretability
A robust interpretation of models can only be achieved with models with high predicative accuracy. Figures 1 and 2 showed that that GBM yielded highest predicative accuracy for Lanzhou 2011–2019 data and Lhasa 2014–2019 data. To compare its interpretation with the traditional parameter-based models, we presented the results of OSL linear models across different datasets in Table 1 for reference.
Table 1 Results of ordinary least squares (OSL) linear regression across Lanzhou and Lhasa year-round datasets. REC - residential electricity consumption.
Data
|
Response variable
|
R2
|
Coeff.
|
Predictor variables
|
p
|
Lanzhou
2011–2019
(year-round)
|
REC
|
0.901
|
-1459.1703
|
Temperature
|
0.000
|
-136.8127
|
Precipitation
|
0.441
|
3154.4050
|
Income
|
0.000
|
-277.0476
|
Population
|
0.023
|
Lhasa
2014–2019
(year-round)
|
REC
|
0.851
|
-438.3596
|
Temperature
|
0.000
|
48.2718
|
Precipitation
|
0.374
|
62.8833
|
Income
|
0.863
|
1915.3652
|
Population
|
0.012
|
Table 1 shows that the OSL models can explain 90.1% of the variation in the Lanzhou 2011–2019 year-round data and 85.1% in the Lhasa 2014–2019 data. Hence, it was meaningful to compare the OSL interpretation with the GBM model interpretations for Lanzhou since both were good models for the Lanzhou datasets, with GBM yielding better predictions (Fig. 1).
Despite their reputation for being “black boxes” that are difficult to interpret(Azodi, Tang, & Shiu, 2020), algorithmic models have been proven to be interpretable in ways that not only show the individual impact of predictor variables, but also the joint impact of multiple predictor variables (Azodi et al., 2020; J. H. Friedman, 2001; J. H. Friedman & Meulman, 2003; Goldstein, Kapelner, Bleich, & Pitkin, 2015; Mehdiyev & Fettke, 2020; Molnar, 2020; Strobl, Boulesteix, Zeileis, & Hothorn, 2007; Zhao & Hastie, 2019). We used partial dependence plots (PDPs) and individual conditional expectation (ICE) plots to interpret the data mechanisms underlying the algorithmic models (Mehdiyev & Fettke, 2020; Molnar, 2020; Zhao & Hastie, 2019).
We used the concept of feature for discussions on PDPs and ICE plots. The concepts of “feature” and “predictor variable” are often used interchangeably, especially in machine learning models. However, we made a distinction between the two in our study for clarity. We use the term “predictor variables” to refer to the individual variables we use for modeling (i.e., temperature, precipitation, income, and population). A “feature”can either be an individual predictor variable such as temperature or two predictor variables combined into one, e.g., climate, which comprises both temperature and precipitation.
The PDPs depict the overall dependence of model prediction on a given input feature by marginalizing over the values of all other input features; ICE plots disaggregate this average by showing the estimated functional relationship of each instance (Goldstein et al., 2015; Mehdiyev & Fettke, 2020; Molnar, 2020; Zhao & Hastie, 2019). While PDPs are useful for depicting the overall marginal effect of a given feature, they can also obscure heterogeneous relationships caused by interactions (J. H. Friedman & Meulman, 2003; Goldstein et al., 2015; Mehdiyev & Fettke, 2020; Molnar, 2020; Zhao & Hastie, 2019). For this reason, we compared ICE plots and PDPs to determine if heterogeneous relationships existed in the estimation.
We used PDP utility offered by scikit-learn (Pedregosa et al., 2011) for ICE and PDP visualization. To compare with the OSL interpretation based on coefficients for predictor variables, we first interpreted the role of predictor-variables for the GBM models as shown in Figure 3. Each individual plot shows both the ICE plots and PDPs of a given feature; in this case, a predictor variable for the corresponding model, with y-axes showing partial dependence, or the marginal effect of a given feature on the REC estimation, and x-axes showing the value of the corresponding feature. Note that marginal effect is not the predicted REC value; rather, it is the way in which REC changes with the feature. It is evident that all ICE plot curves generally follow the same pattern as that shown by the PDP line. This means the PDPs of the models provide a reliable interpretation of the relationships between the features and REC.
Interpretations by GBM and OSL on individual predictor variables are generally consistent in temperature‒REC relationships, with the innately non-linear GBM being more informative than OSL models. GBM suggestedthat precipitation plays a role in influencing REC, whereas OSL appeared to differ since it failed to come to a conclusion on the role of precipitation, as shown by the high p-value and opposite signs of precipitation coefficients in the two cities in Table 1. At this stage, we cannot be sure which interpretation is correct due to the multicollinearity between year-round temperature and precipitation data, as shown in Figures 4a and 4c.
Multicollinearity may be one of the reasons why precipitation has not been fully considered in previous studies. Multicollinearity can reduce the precision of estimated coefficients in linear regressions and it is common practice to remove or lessen the weight of one predictor (in this case, precipitation). Correlations between temperature and precipitation are also a challenge for GBM interpretation, since the use of PDPs and ICE plots is based on the assumption that features are not correlated (Mehdiyev & Fettke, 2020; Molnar, 2020; Zhao & Hastie, 2019).
To test our hypothesis regarding the role of precipitation during the cold season, we created a data subset consisting of data points from November, December, January, and February, i.e., the cold-season only. Multicollinearity between temperature and precipitation in this subset was no longer a problem for either city as shown in Figures 4b and 4c. For this reason, we used this subset (rather than the year-round dataset) to analyze the individual roles of temperature and precipitation.
Quantification of year-round climate impact, however, needs to be conducted without interference from multicollinearity. Interpretations of an algorithm model can manage the multicollinearity between temperature and precipitation by combining them into one feature(Molnar, 2020). Unlike OSL models that can only rely on coefficients of individual predictor variables for information, PDPs of algorithmic models can be applied to a predictor variable as well as to a feature comprising two combined predictor variables (J. H. Friedman & Meulman, 2003; Goldstein et al., 2015; Molnar, 2020). Since the climate feature was not correlated with the other features of the model in this study, our interpretation of climate impact was not affected by the influence of multicollinearity.
It is noteworthy that the PDP for the climate feature was not a simple combination of the individual PDPs for temperature and precipitation, as shown in Figure 3. Rather, the PDP for this newly created climate feature captured changes in temperature and precipitation as well as their interactions, and showed the marginal effect of all of these as a whole since partial dependence was calculated by marginalizing over all the other inputs of the estimation, i.e., income and population (Friedman and Meulman 2003; Molnar 2020).
When quantifying the marginal effect of the climate feature, we first averaged the monthly mean temperature and monthly precipitation by calendar month to represent the corresponding mean climate patterns for the cities during their respective study periods. We then obtained possible variation ranges of the climate patterns using standard deviations of monthly mean temperature and monthly precipitation for each calendar month in the two cities. Finally, by averaging the marginal effects of all climate conditions within the variation range for each month for the two cities, we obtained the average marginal effects of climate for each calendar month in the two cities.
We tapped further into this potential of the algorithm model for holistically quantifying climate impact on REC to predict the impact of climate change on REC of the two cities.This was achieved by replacing climate observations with climate projections under RCP2.6, RCP4.5, and RCP8.5 scenarios in the PDP-based quantification. As our study focused on the climate impact, we effectively controlled the socioeconomic conditions at the status-quo level of the study periods. We made projections for the mid-century decade from 2036‒2045, since this period is more likely to accommodate the socioeconomic presumption than more distant periods, but is sufficiently distant for meaningful climate projections.