Electricity Demand Prediction of Smart Grid in Smart Cities Using K-means Algorithm

The aims are to unify big data management among various departments in smart city construction, establish a centralized data collection and sharing system, and provide fast and convenient big data services for smart applications in various elds. The national grid industry is taken as the research object. A new electricity demand prediction model is proposed based on smart city big data’s characteristics. This model integrates meteorological, geographic, demographic, corporate, and economic information to form an intelligent big database. The K-mean algorithm mines and analyzes the data to optimize the electricity user information. The electricity prediction model is established using the Backpropagation (BP) neural network algorithm. The electricity market is evaluated through an in-depth exploration of data relationships to verify the effectiveness of the model proposed. Results demonstrate that the K-mean algorithm can signicantly improve electricity user segmentation accuracy, separate the different regional electricity consumption, and categorize different electricity users. The electricity demand network model constructed can signicantly improve the prediction accuracy, and the mean error rate is 3.2671%. The model’s training time improved by the additional momentum factor is signicantly reduced, and the mean error rate is 2.13%. The above results can provide a theoretical and practical basis for electricity demand prediction and personalized marketing, as well as development planning for the electricity sector.


Introduction
A smart city is the effective integration of real world elements and information virtual world via new technologies and management methods, such as the Internet of Things and cloud computing, to create a visual, measurable, perceivable, interactive, and controllable intelligent production and life approaches and urban operating mechanisms. Therefore, the resource utilization can be improved, the cost and energy conservation can be strengthened, and the service e ciency can be promoted [1]. Constructing smart cities reduces the negative impact on the natural environment and supports the low-carbon economy and circular economy. The remote cloud computing center completes big data analysis, establishes reasonable and e cient control strategies, and continues to innovate based on people to provide a decision-making basis for urban management, providing various intelligent and convenient services to the public [2]. From a development content perspective, smart cities are inseparable from smart electricity consumption. With the increasingly prominent energy crisis and environmental problems, a large number of distributed energy sources and energy storage devices are connected. Moreover, the industrial society has put forward higher requirements for the reliability of power supply and power quality [3]. In China, the major advocate and construction unit of the smart grid is the State Grid Corporation of China. Under its leadership, various information technologies have been vigorously integrated, pursuing the safety performance of the power system, the stability of power supply and distribution, and improving the e ciency of energy use, thereby achieving the dual reduction of energy consumption and environmental pollution levels in the power supply and distribution process [4]. The smart grid is designed to further optimize the production, distribution, and consumption of electric energy by obtaining more customer electricity information and to highly integrate advanced sensor technology, Internet of Things technology, information technology, communication technology, and other modern high-tech approaches to form a new type of integrated network system. Hence, information ow can be exchanged between power grid equipment and can also implement advanced functions such as real-time intelligent control of the power grid, remote debugging, online analysis and decision-making, as well as the multi-site collaborative work depending on the speci c situation [5]. In smart grid-related technologies, forecasting the power demand is very important in the power industry, which has critical signi cance in the operation, planning, and control of the power system. Power demand forecasting is a vital foundation for ensuring the safe operation of the power system, achieving the scienti c management and uni ed dispatching of the power grid, and formulating a reasonable development plan [6]. Electric energy consumption is affected by multiple factors. However, traditional forecasting introduces fewer factors as variables, and only considers the internal data of the power system, such as maximum load and regional average electricity consumption, failing to fully consider the impact of changes in other factors on the forecast [7]. Therefore, studying the power demand of smart grids is of great signi cance to the development of the national grid.
There are many studies on electricity demand prediction in the past few years. Classic and traditional electricity prediction methods have failed to adapt to the changing electricity market. Some new methods have been applied to electricity demand prediction. He et al. (2017) quantitatively analyzed the factors that affected electricity prediction and proposed a long-term electricity demand prediction model suitable for the new economic normal; compared with the traditional models, this model has high prediction accuracy [8]. Laoua et al. (2017) proposed a new prediction combination method to generate very shortterm electricity demand prediction under normal and abnormal load conditions and found that the system could achieve good prediction accuracy and avoid large prediction errors [9]. Mirjat et al. (2018) predicted that Pakistan's average electricity growth rate would be 8.35% in subsequent years using deep learning and electricity data from 2015 to 2017; this data was19 times the existing base [10]. Khalifa et al. (2019) used electricity consumption data to model the Qatar electricity market; they found that more energy consumption would be generated around 2030 and proposed to improve electricity e ciency by reducing electricity consumption [11]. The above research shows that different scholars have adopted different algorithm models for electricity demand prediction. Compared with the traditional models, the performance has been greatly improved, and future electricity demand can be successfully predicted.
However, some models are based on the demand of the electricity market. During prediction, they may fail to comprehensively consider the in uencing factors of electricity demand from other aspects. Therefore, establishing a prediction model by nding the critical in uencing factors from appropriate methods and massive data is crucial for electricity demand prediction.
By investigating literature about smart city and smart grid, an electricity demand prediction model is proposed according to the actual situation, which is for the new characteristics of current smart city development and the new needs of smart grids. A segmented model is created by segmenting electricity users and adopting the smart city database to analyze the differences in population, legal person, economy, and geographic information. Innovatively, a new electricity prediction algorithm is proposed, realizing high-precision prediction via the correlation between data. The results can provide a theoretical and practical basis for constructing smart grids and research ideas for smart city construction.
The contributions are: (1) based on traditional customer segmentation methods, the differences are analyzed through the demographic, legal person, economic and geographic information provided by the smart city basic database to create a customer segmentation model. The reform of China's corporate system continues to deepen, and the power industry is now on the agenda. In this way, China's power companies are no longer the same as those in the past. The power generation, power supply, and distribution of the companies are under uni ed management by the national government, with a xed source of customers and stable economic bene ts. Nowadays, China's power companies have also been promoted to the market, which will face more competition. Thus, the power industry in China must respect the laws of the market like private enterprises, improve the concept of market competition, change its attitude towards customers, and strive to increase pro ts. (2) The characteristic relationships among the following economic indicators are extracted: the total electricity consumption and the area's population, GDP (Gross Domestic Product), GDP per capita, the number of units of industrial enterprises above designated size, and total imports and exports from the basic database of smart cities. Combined with power companies' internal data, a medium-term power demand forecasting model is established.

A. Methods of electricity demand prediction
The three most commonly used electricity prediction methods are the classic prediction method, the traditional prediction model, and the intelligent prediction model. (1) Classic prediction methods include trend extrapolation method, classi ed electricity demand prediction method, and load density method.
These prediction methods are widely used, but most of them analyze the relationship between some simple variables and lack in-depth data analysis; therefore, the prediction accuracy is often low [12]. (2) Traditional prediction methods include the regression analysis method, which establishes the relationship between the dependent variables and known load data and predicts the electricity system's load using mathematical analysis. Applications of the time-series method include the exponential smoothing method and the Census-H Decomposition method; the random time series methods include the state space method, the Box-Jenkins method, and the Markov method [13]. According to the given data, the relationship between the variable and the dependent variable is determined, and the regression equation and various parameters are found. Based on the obtained equation, the dependent variable is obtained from the existing independent variables, and nally, the electricity prediction data are obtained. (3) When there are large random factors in historical electricity demand, the prediction effect is awed and is greatly affected by bad data in the time series. In recent years, the electricity market has become increasingly complicated. Classic prediction methods and traditional prediction methods cannot adapt to the nonlinear, multi-variable, time-varying, and random characteristics of the electricity market. Hence, some new prediction methods are used in electricity demand prediction. The laws are extracted to establish a knowledge base for reasoning and judgment based on real experience [14]. The detailed comparison results of the advantages and disadvantages are shown in Table 1. (1) Algorithm utilization: At present, smart grid construction needs to comply with market laws and rely on its competitiveness to achieve the survival and development of enterprises. The core is to fully understand all users, improve user experience, and enhance user loyalty. For customer segmentation, the traditional segmentation method uses a single indicator and cannot effectively divide users. With the development of smart cities and the advancement of big data technology, a large amount of data can be obtained, while data mining technology can be used to extract the required indicators to segment power customers [15]. Currently, among data mining algorithms, the K-Mean clustering algorithm has attracted the attention of many scholars due to its simple implementation and high e ciency.
The K-means clustering algorithm uses the distance of the target data as an evaluation indicator to measure the similarity. When the distance between two objects is small, the similarity between these data is relatively high. This type of algorithm usually consists of a group of relatively close objects, and the nal goal is to obtain a data group with a compact distance and a high degree of separation [16].
(2) Algorithm principle: the initial dataset is assumed as (x 1 , x 2 … x n ), and each data unit ring is a pdimensional vector (the p-dimensional vector is composed of p eigenvalues). The K-means clustering algorithm's goal is to divide the original dataset into K categories G={G 1 , G 2 , …, G k } with a given number of categories k (k=n). Each iteration of the K-means clustering algorithm must check whether the classi cation of each data unit is correct. If it is classi ed into the wrong category, the data must be adjusted. When the adjusted data is clustered with k points in the space as the center and the next iteration starts, the value of each cluster center is updated in turn until the cluster center does not change, indicating that the clustering criterion function has converged, and the best clustering result is got [17]. The algorithm's work ow is shown in Figure 2.

C. Backpropagation (BP) neural network algorithm
Arti cial Neural Network (ANN) is composed of numerous neurons, in which these neurons are connected. ANN has a strong nonlinear mapping ability [18]. BP neural network is a multi-layer feedforward network trained according to the backpropagation algorithm [19]. The topological structure of the BP neural network algorithm model includes an input layer, a hidden layer, and an output layer, as shown in Figure  2.
BP neural network is usually composed of multiple layers and multiple neurons, which are mainly divided into an input layer, a hidden layer, and an output layer [20]: The input vector should be: In (3), sl is the number of neurons in the l-th layer. Assuming that is the connection weight between the j-th neuron in the l-th layer, is the threshold of the i-th neuron in the l-th layer, and is the input of the i-th neuron in the l-th layer, the following equation can be obtained: D. Electricity user segmentation model (1) Overall framework: based on the above theory, the primary database of smart cities is utilized to establish a functional structure model for electricity user segmentation ( Figure 3). First, a data warehouse is established. Then, relevant user segmentation data are extracted for data analysis. Afterward, data are cleaned and conversed. The association analysis method is adopted for data mining, and nally, the mining results are analyzed. The primary smart city database is the foundation of the entire model. Preprocessing of data is the guarantee for real and effective mining results. The effectiveness of user segmentation depends largely on selecting user consideration standards and establishing measurements. The adopted mining method based on actual needs is the key to the entire model.
(2) User segmentation: currently, the research and the adopted electricity data analysis technology use traditional data mining and statistical methods. The smart city public primary database data are added to the electricity big data analysis for improving the algorithm's accuracy and usability. Besides, the big data method is utilized for prediction, and the used data types are as many as possible. The prediction task that has been impossible previously can be completed by exploring the data association relationship, which can ensure high precision simultaneously. The electricity value of the population analyzes users' electricity value from a personal perspective and obtains the gathering area of highpotential users through personal information and social insurance information.
(3) Data processing: for con dentiality concerns, the Dongfangtong TI-ETL tools, and data desensitization technology are utilized to transform sensitive or con dential information through desensitization rules, thereby protecting the sensitive and private data. For some missing data, the citizen's name, gender, address, and other elements are extracted from other information. Various social security databases are integrated, and the collected information is used to roughly restore the demographic information and provide the basis for electricity user segmentation.
(4) Algorithm realization: it is divided into population electricity value information, enterprise commercial value distribution, and macroeconomic information. For human electricity value information, as shown in Figure 4, the electricity users are segmented, and the integrated resident data are arranged in a table from high to low in the order of individual units according to residents' social insurance information, social security information, corporate information, and the potential and in uence of electricity use. Each administrative district is taken as a unit. As for the data of the corporate legal person, the legal person's registered capital is used as the analysis target, and the social insurance information is based on the insurance amount, the social security information based on the subsidy amount, and the housing provident fund based on the monthly payment amount [21].
The distribution of enterprise value is shown in Figure 5. The organization code is used as the search basis to match each commission, o ce, and bureau's source data. The evaluation and ranking are based on four dimensions: business category, registered capital, annual turnover, and the number of employees. Each administrative district is taken as a unit. For corporate legal person's data, K-means cluster analysis is performed based on turnover, registered capital, and the number of employees. For business categories, the conversion weight ratio of the construction industry to turnover is set to 10%, that of the manufacturing is 100%, the wholesale retail industry is 30%, and the service industry is 15% [22].
In terms of the macro-economic information, the macro value evaluation aims to evaluate the administrative districts' electricity consumption potential for the segmentation of electricity users. The data of some districts can be accurate to the administrative streets. According to previous studies, several signi cant data categories are selected, such as regional Gross Domestic Product (GDP), per capita disposable income, per capita GDP, total asset investment, and trade data [23]. Therefore, the prediction accuracy of electricity information is very limited [24]. The external information of electricity companies provided by the primary database of smart cities is fully utilized. Given the impact of changes in various relevant objects on the prediction value, the prediction of regional annual electricity consumption is researched involving multiple in uencing factors.
In addition to the electricity companies' factors, the total electricity consumption is also affected by many factors such as population and corporate trends, economic conditions, energy policies, and electricity price adjustments. In particular, the development of social and economic operations needs to consume a large amount of electric energy, and there is a correlation between electricity consumption and economic indicators [25]. There are many statistical dimensions of economic data. The characteristic relationship of a district's social product, per capita GDP, price index, total import and export, and other economic indicators are combined with the internal data of electricity companies to establish a mathematical model for electricity consumption prediction [26]. The speci c names of the data indicators are as follows: (1) macroeconomic data; (2) policies and other external data; (3) regional electricity consumption data in past years; (4) electricity user segmentation data.
(2) Normalization processing: ANN usually normalizes the data before training. The training effects of different transfer functions are different to avoid neuron oversaturation [27]. The input data value must be within [0,1], which is the characteristic requirement of the transfer activation function. Therefore, the original data of the network must be processed. The original data are normalized, and the equation is as follows: In (6), is the i-th feature parameter after normalization, is the original i-th feature parameter, is the minimum value of the i-th feature parameter, and is the maximum value of the i-th feature parameter.
(3) Parameter determination: after preliminary experiments, a three-layer network model structure with a hidden layer is determined. The number of neurons is 18, and the logsig transformation function is used. The number of neurons in the second layer is the same as the number of output variable vectors, and the output layer uses a pure linear transformation function. The input feature parameters are 65; that is, the number of input layer nodes is 65, and the number of output layer nodes is 5. Generally, increasing the number of nodes in the hidden layer can reduce the network's training error more than increasing the number of hidden layers. The BP neural network algorithm can be set as a three-layer structure to map the n-dimensional input layer to the m-dimensional output layer. Therefore, the number of hidden layers in the network is determined as 1. When applying a neural network for electricity prediction, the reference equations for selecting the number of hidden layer neurons are as follows: In (7) and (8), h is the number of nodes in the hidden layer, m is the number of nodes in the input layer, and n is the number of nodes in the output layer. After a comprehensive comparison of experiments, the number of nodes in the hidden layer is selected as 18. The S-tangent tansing is selected as the activation function for hidden layer neurons, and the activation function for output layer neurons is the S-type logarithmic logsig function.
(4) Data source: The streaming data in the power grid come from the collection of smart meters, PMUs, and various sensors. These data are large in scale, diverse in structure, and fast. To accurately obtain the electricity consumption data of different electric equipment of users, the electric power company has installed a large number of smart meters, which will send real-time electricity consumption information to the grid every 5 minutes. The real-time collection of streaming data requires the characteristics of fast collection speed, high reliability, real-time monitoring of data changes, and simple data processing. Therefore, the collection system is a distributed, reliable, and highly available system of massive log aggregation, which can monitor and receive data from the client and send it out. When a node fails, the log le is transferred to other nodes without loss, ensuring data integrity.

F. System improvement and veri cation
(1) System improvement: the in uence of changing trends on the error surface. The BP neural network may fall into a local minimum, which can be prevented by the additional momentum. The adjustment equations for weight and threshold with additional momentum factor are: In (9) and (10), w is the weight vector, k is the number of training, mc is the momentum factor, is the learning rate, is the gradient of the error function, and and are the correlation coe cients.
(2) Clustering algorithm evaluation: Here, the K-means clustering algorithm uses the square-error and criterion function to evaluate the clustering performance. X represents the given dataset, and each data unit is a p-dimensional feature vector. It is set to K categories. The algorithm randomly selects k data as the starting cluster center analyzes the distance from each data unit to the cluster and divides the data into the array sink where the corresponding cluster center is located. It is supposed that X contains k data groups X 1 , X 2 … X k , the amount of data units in each data group is m 1 , m 2 , ..., m k ; the cluster centers of each data group are n 1 , n 2 , ..., n k . The used square error equation [28] is: (3) BP neural network training: the training times of the neural network is 10,000, the expected error of training is 0.02, and the learning rate is 0.01. The pre-processed data matrix is imported into MATLAB and normalized. The newff function is employed to establish a BP neural network model. After the calculation, the data go through the denormalization processing, and the MATLAB toolbox performs calculations to obtain the relative error percentage and the predicted electricity demand value.

A. Results of electricity user segmentation
Different electricity consumption areas can be divided through the above user segmentation model, as shown in Table 2 and Figure 1. The four companies in District A are in the high electricity consumption area in their administrative district. The medium electricity consumption area is the o ce area of District B. The area with low electricity consumption is the residential electricity area of District C. This is consistent with the actual result. Then, all areas are divided into residential electricity and commercial electricity areas. According to the model, the number of all residential users and commercial users in the city can be subdivided into four levels from P1-P4 from more to less electricity consumption. The number of residents with the highest electricity consumption reached 25,678. The number of commercial users with the highest electricity consumption reached 298. Such results also con rm the effectiveness of the segmentation model proposed. B. Performance analysis of electricity demand prediction Figure 7 shows the results of electricity demand prediction calculated by the BP neural network model. The electricity demand prediction results in different areas are not much different from the actual value, and the largest relative error is 5.129%. The minimum value is 2.294%, and the average relative error is 3.2671%. Hence, the BP neural network has a good prediction effect. As shown in Figure 8, the model's training error is analyzed, and the results suggest that when the training time of this model is 5,000s, the model performance tends to be stable. After comparing the literature, the training speed is very long, so that it cannot satisfy the actual demand.
C. Performance analysis results of the improved algorithm After optimization, as shown in Figures 9 and 10, after optimizing the algorithm, the improved BP algorithm has a smaller average relative error, with an average relative error of 2.13%, and a faster training speed compared with the unimproved BP neural network. Besides, the improved algorithm has reached a stable state in the 2000s, and its training accuracy is higher, which is more advantageous in predicting electricity demand. Hence, the improved algorithm meets the basic requirements of the electricity company for electricity prediction and has particular practical application value.

E. Performance comparison of different models
As shown in Figure 11, the proposed model is compared with the latest models. The accuracy of the BP neural network used in the literature [29] is relatively high and remains at 79.63%. Compared with methods in other literature, the proposed model has the best performance. The accuracy rate of the model is 85.25%, and the average value is also the largest, which is 83.72%. Hence, the proposed model is signi cantly better than the latest models in terms of algorithm performance.

Discussion And Analysis
The comprehensive application of big data in the smart city system can provide trend judgments and information sharing, promoting the innovative management and interconnection of cities and the healthy competition of various industries and sustainable development of the entire society. Besides, with the increasingly prominent energy crisis and environmental problems, a large number of distributed energy sources and energy storage devices are connected, and the industrial society has put forward higher requirements for power reliability and power quality. The load forecast of the power system is of great signi cance to the dispatch of the power system, which is the fundamental basis for formulating power generation and transmission plans and an in uential aspect of the modern development of the power market. Improving the accuracy of power system load forecasting can effectively improve the economic bene ts of the power sector and promote the safe and economic operation of the power grid. Here, the demand forecasting problem of the power system is studied by consulting a large number of relevant domestic and foreign documents.
The characteristics and signi cance of power system demand forecasting are introduced in the context of the smart city big data. Traditional power data analysis methods are mostly based on limited sample data. New algorithms are proposed, or existing algorithms are improved based on proprietary data in this industry and eld to achieve higher prediction accuracy and faster processing speed. The big data method for forecasting requires as many types of data as possible. The previously impossible forecasting task is completed by exploring the data association relationship, thereby ensuring high accuracy. During power information forecasting, the focus of work is shifted from the research of complex algorithms to the preparation of big data acquisition. On this basis, the existing methods of power demand forecasting are compared and their respective characteristics are analyzed, which is also reported in the relevant literature [33]. After data mining, the K-means clustering analysis algorithm is used to establish a power customer segmentation model for the in-depth processing of the basic data of smart cities. Through cleaning, conversion, and establishment of new data warehouses, its intrinsic data characteristics are pointed out, and the intrinsic value of data is tapped [34]. Compared with the traditional customer segmentation methods in the past, depending on the basic data of smart cities can describe customer behavior more comprehensively and accurately, which provides a reference for power companies to formulate appropriate marketing strategies and other in-depth research.

Conclusion
The establishment and mining of big data are utilized to establish a smart grid user segmentation model through the K-means algorithm for electricity demand prediction. Through cleaning, conversion, and establishing new data warehouses, the intrinsic value of data is deeply explored. Compared with the traditional user segmentation method, the proposed model can describe user behavior more comprehensively and accurately relying on the primary data of smart cities. Also, the results of electricity user segmentation are incorporated into the input samples. The BP neural network method is utilized for electricity demand prediction on the electricity system to establish a BP neural network model. The input data are normalized, proving the validity of the model. Although a suitable electricity demand prediction model has been established, several shortcomings are found. First, there is no mature multi-indicator prediction model for the relationship between time and social factors and electricity demand. The electricity demand prediction model constructed is still in the research phase based on actual cases, lacking universal applicability. Second, modeling the impact of annual weather conditions on electricity load is demanding. In electricity load prediction, the analysis of electricity consumption is accurate based on weather changes. However, annual weather changes are very macroscopic and challenging to measure in local areas accurately. Hence, generating accurate digital reports on economic activities, people's lives, trade, and transportation in particular areas is challenging. Therefore, it is di cult to be re ected in the electricity prediction model. In the future, these aspects will be explored profoundly to improve the proposed electricity demand prediction model continuously.

Declarations
Compliance with Ethical Standards Con ict of Interest: All Authors declare that they have no con ict of interest.
Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent was obtained from all individual participants included in the study.