Big data and artificial intelligence application in energy field: a bibliometric analysis

This paper uses bibliometrics to characterize the knowledge systems of big data, artificial intelligence (AI), and energy based on the Science Citation Index Extension (SCI-E) and Social Science Citation Index (SSCI) of the Web of Science from 2001 to 2020. Results show that China is the country with the highest number of publications (1115), accounting for 29% of the total; however, the most influential country in the field is the USA, with an h-index of 75. The Chinese Academy of Sciences publishes the largest number of papers (104) and plays a vital role in the collaboration network. The study also reveals that the IEEE Access is the most productive journal (195) in terms of the number of publications, and engineering is the most popular discipline (1526). The key theoretical foundation includes deep learning (293), big data (105), energy consumption (79), and reinforcement learning (40). The application of big data and AI in the field of energy focuses on smart grid, energy consumption, and renewable energy. Early research frontiers involve optimization and prediction of energy-related problems using the genetic algorithm and neural networks. Since 2013, energy big data have gained prominence. At present, machine learning, deep learning, and fog computing are frequently combined with energy saving. In the future, big data and AI will be utilized to promote the application of renewable energy and energy-saving renovation of buildings. These findings can help researchers understand the developmental trends and correctly grasp the research direction and method of the emerging interdisciplinary field.


Introduction
An energy system provides strong support for economic growth and social development. However, there is a contradiction between the current energy system and the climate change mitigation and sustainable development goals. The International Energy Agency (IEA) assesses the climate commitments and intentions submitted by different countries, as well as their impact on the energy sector. IEA urges different countries to strengthen their emission-reduction commitments in order to meet the 2 °C target. The energy sector must play a key role in reducing global emissions because energy production and usage contribute to twothirds of global greenhouse gas emissions. The development of emerging technologies is the key to ultimately transforming the energy system and meeting the climate goals (IEA 2015).
Big data and artificial intelligence (AI) are emerging technologies, and their applications in the energy field are supported by governments and recognized by the industry. In the past 20 years, research on the combination of big data, AI, and energy that covers a wide range of disciplines has shown a rapid increase. AI trains data through algorithms that allow computers to mimic human thought and consciousness. Therefore, big data is the foundation of AI. Big data cannot be completely stored in a single machine and must rely on distributed processing, distributed database, cloud storage, virtualization and other cloud computingprocessing technologies. The Internet of Things has become an important source of big data by connecting a collection Responsible Editor: Philippe Garrigues * Qunwei Wang wqw0305@126.com 1 of objects that need to be monitored in real-time, as shown in Fig. 1. Large amounts of data are accumulating in the energy field due to the continuous application of technologies such as sensors and cloud computing (Zhou et al. 2016). Energy big data have permeated different energy-related fields, including energy production (Ifaei et al. 2018), energy consumption (Abid et al. 2017), energy planning, and policy (Giest et al. 2018). The development of big data has promoted the advancement of AI technology to a large extent. The most common application of AI in the field of energy is prediction, including load forecasting (Chou et al. 2014), power generation output forecast (Puri et al. 2019), and energy consumption forecast for agricultural production (Nabavi-Pelesaraei et al. 2018;Kaab et al. 2019). Fault detection and diagnosis AI technology also play a key role in power system fault diagnosis (Zhao et al. 2019).
There is a continuous and rapid increase in the amount of literature published on the combination of big data, AI, and energy. Previous studies reviewed the application of big data and AI in building energy systems (Daut et al. 2017, Wang et al. 2017, Mehmood et al. 2019, Zhao et al. 2019) and renewable energy (Jha et al. 2017). A systematic review of big data analytics for smart energy management was conducted from a methodological perspective (Zhou et al. 2016). In addition, applications of machine learning in smart grid , Kumar et al. 2020) and potentials of AI in the energy system (Sun et al. 2019) have been discussed. However, existing literature reviews are limited in their scope, and traditional analysis methods have not comprehensively analyzed the research hotspots and evolutionary track of the research frontier in interdisciplinary research. It is necessary to make a timely quantitative and qualitative analysis of the research papers in this field for evaluating the growing body of knowledge. Bibliometrics technology provides a useful tool to achieve this aim. This paper uses bibliometrics to characterize the knowledge systems of big data, AI, and energy based on the Science Citation Index Extension (SCI-E) and Social Science Citation Index (SSCI) of the Web of Science from 2001 to 2020. To be specific, this study shows that the interdisciplinary research on big data, AI, and energy is gradually becoming a popular research topic. Then, this study illustrates cross-disciplines and collaborative networks of institutions. Finally, this study presents the research foundation, research hotspots, and the evolutionary track of the research frontier.
This study provides a comprehensive overview of research for scholars and practitioners interested in big data and AI research in energy. It can help researchers understand the developmental trends and correctly grasp the research direction and method of the emerging interdisciplinary field. It can also assist energy enterprises to find the right direction for technology investment on big data and AI.

Methodology
Bibliometric methods apply mathematics, statistics, and philology to comprehensively and quantitatively analyze the number of articles, authors, institutions, and discipline words across all knowledge carriers. This analysis enables data mining on the research status quo, research hotspots, and frontiers of the discipline; and supports their accurate and intuitive display using a knowledge graph. The knowledge graph is a new field of scientometrics that can illustrate the process and track knowledge development in a graphical way.

H-index
The h-index, where h stands for high citations and evaluates the academic achievements of individuals or institutions. The index was first proposed by the physicist Jorge Hirsch in 2005 according to the following definition: "A scientist has index h if h of his or her Np papers have at least h citations each, and the other (Np-h) papers have no more than h citations each" (Hirsch 2005). The goal of the h-index is to quantify the research results of researchers as independent individuals. A higher h-index of an individual demonstrates a greater influence of his/her paper. In this study, we applied the h-index to measure the comprehensive influence of an institution's or a journal's academic achievements in the field of big data, AI, and energy.

Social network analysis
Social network analysis (SNA) is a quantitative analysis method based on mathematical methods and graph theory. In this study, we use three concepts of SNA, including cooccurrence analysis, centrality, and burst detection. In order to visually show the conclusions of SNA, knowledge graph is applied. In this study, CiteSpace5.6.R5 is used to obtain the knowledge graph in the fields of big data, AI, and energy.

Co-occurrence analysis
In bibliometrics, commonly featured items in the same article must be correlated. The degree of correlation is measured by the co-occurrence frequency. A co-occurrence network can show the results of co-occurrence analysis more vividly in the form of a co-occurrence graph.

Centrality
The importance of a node in a co-occurrence graph is quantified according to its centrality. Centrality is used in a social network to express the degree to which a node is located in the center of the entire network. According to the calculation method, centrality can be divided into three types: degree centrality, closeness centrality, and betweenness centrality. The degree centrality represents the sum of a node and other nodes that are directly connected. The betweenness centrality represents the number of shortest paths through a node. A higher count of shortest paths through a node results in a higher betweenness centrality values. Nodes with a betweenness centrality greater than 0.1 are considered to be key nodes in CiteSpace (Chen et al. 2014a, b). In this study, degree centrality and betweenness centrality are used to analyze the scientific research cooperation and interdisciplinary integration.

Burst detection
According to the changing trends in the frequency of specific words over a period of time, burst detection was used to discover the words with a high change in word frequency in the social network. In CiteSpace, the burst words were detected using an algorithm proposed by Kleinberg in 2002(Kleinberg 2003. The number of burst words can be adjusted by varying the duration of appearance. In this study, key words were detected and burst time was set at 2 years.
Unlike traditional literature review, this paper uses bibliometrics to study the application of big data and AI in the energy field from the perspectives of inter-discipline integration, research foundation, research hotspots, and the trajectory of the research frontier, as shown in Fig. 2 this research area involves many professional disciplines, the conclusion of this research is helpful to correctly grasp the research direction and method of the emerging interdisciplinary field with many"cross-industry professional concepts.

Results
The data for this study were collected from Science Citation Index Expanded (SCI-E) and Social Sciences Citation Index (SSCI). The data were retrieved from 2001 to 2020 to ensure the timeliness and accuracy of the research content. The retrieval date was May 5, 2021. The recall rate of the literature was improved by using the following phrases to search titles, keywords and abstracts: TS = (("artificial intelligence" OR "artificial intelligent" OR "big data" OR "large data") AND energy). The literature type was set as "article" and the language was set as "English". A total of 3842 articles were obtained.

Publication distribution of countries
From 2001 to 2020, a total of 108 countries contributed to the research relating to big data, AI, and energy by means of publishing academic articles in the SCI and SSCI databases. There were 15 countries, where each country published more than 2% of the articles. Among the three main publishing countries, the most influential country in the field was the USA, with an h-index of 75. Although China ranked first in the number of published papers, its citation rate was lower than that of the USA, with an h-index of 68.
Countries or institutions that appeared together in an article were considered to have research partnerships. The co-country network or co-institute network is obtained when the node of the co-occurrence is a country or an organization. The centrality obtained based on the co-country network are shown in Table 1. Degree centrality reflects the breadth of a country's cooperation. France has the largest number of cooperation countries. France, Spain, and Italy show betweenness centralities greater than 0.1 and play a key role in the process of forming national cooperation. Figure 3 shows the variations in the number of annual publications in the top three countries according to the number of publications. The top two countries were China (1115) and the USA (927), which published significantly higher number of papers than any other country. Starting from 2016, the number of publications in China surpassed that in the USA. The USA always ranked first prior to 2016. From 2001 to 2020, the country with the most total publications was China (1115 articles), followed by the USA at 927 articles and the UK at 290 articles.

Publication distribution of institutions
From 2001 to 2020, a total of 3963 institutions contributed to the research on big data, AI, and energy by publishing academic articles in the SCI and SSCI databases. The coinstitution network was built based on the top 100 levels of most cited each year, and the degree centrality was obtained. Table 2 shows that there were 30 institutions that published 20 or more papers.
The top 30 productive institutions are mainly from China and the USA: 14 are from China and eight are from the USA. The other institutions are from South Korea, Iran, UK, Singapore and Vietnam. Some of the top 20 productive countries do not appear in Table 2. These countries include India, Australia, Japan, Taiwan, and several European countries. The institutions in these countries that publish papers on big data, AI, and energy are relatively scattered. The Chinese Academy of Sciences (Chinese Acad. Sci.) publishes more papers and collaborates with more institutions than any other institution. In the top five productive institutions, one of them belongs to the USA and the rest belong to China. According to degree centrality, Chinese Acad. Sci. has the largest number of partners. Figure 4 graphically illustrates the results of the co-institution analysis. The nodes in the purple circle indicate the key nodes, whose betweenness centrality is higher than 0.1. A line indicates a cooperative relationship between institutions. The thickness of the line indicates how closely the institutions work together. The thicker the line, the closer the cooperation between the institutions. The color of the line indicates the year of the first collaboration between the institutions. The darker the color, the earlier the first collaboration between the institutions. As Fig. 4 shows, the Chinese Acad. Sci. is the only key node. The University of Chinese Academy of Sciences (Univ. Chinese Acad. Sci.) is directly under the Chinese Acad. Sci.. The two institutions have a strong cooperation relationship. The University of Electronic Science and Technology of China (Univ. Elect. Sci. & Technol. China) is another important member of the Chinese Acad. Sci.'s domestic publishing partner. As the key node, the Chinese Acad. Sci. has established partnerships with overseas institutes. It has the closest relationship with Georgia Inst. Technol. in the USA. In recent years, the number of cooperative publications between the Chinese Acad. Sci. and Georgia Inst. Technol. has also increased gradually.
Huazhong University of Science and Technology is another Chinese institution that collaborates closely with the overseas institutions. It has produced many co-institution publications with King Saud University (King Saud Univ.) in Saudi Arabia and Korea Advanced Institute of Science and Technology (Korea Adv. Inst. Sci. & Technol.) in Korea. Other Chinese research institutions mainly collaborate with domestic institutions to publish papers on the application of big data and AI in the energy field. Another obvious  international collaboration is between King Saud Univ. in Saudi Arabia and the University of Tehran (Univ. Tehran) in Iran.
There are three obvious core institutions in the USA: (1) Georgia Inst. Technol. is the largest publishing institution in the USA and has the highest number of collaborations with Chinese research institutions. (2) The collaborative network with the University of California (Univ. Calif. Berkeley) at Berkeley at its core includes Stanford University, the University of Michigan (Univ. Michigan) in the USA, and the University of Cambridge (Univ. Cambridge) in the UK. These institutions maintain a close and stable research cooperation with one another. (3) The third cooperation network with Massachusetts Institute of Technology (MIT) at its core includes Purdue University (Purdue Univ.) in the USA, as well as Beihang University (Beihang Univ.) in China and the Islamic Azad University (Islamic Azad Univ.) in Iran.

Characteristic of disciplines/journals
A total of 3842 articles on big data, AI, and energy in SCI-E and SSCI databases have been covered by 108 discipline categories. Table 3 shows the 15 most productive disciplines. A co-discipline network was built based on the top 200 levels of most cited each year, and the betweenness centrality values were obtained.
Degree centrality refers to the number of disciplines related to a specific discipline. A higher value indicates a higher proportion of cross-disciplinary research offered by that discipline. The top two disciplines are engineering and computer science, which are also key nodes in the co-discipline network. These two disciplines respectively encompass the research contents of big data, AI, and energy from two aspects: theoretical basis, and engineering practice and environmental benefits. The other four key disciplines open up new research avenues for ecology, chemistry, science technology, and business on big data, AI, and energy.
Articles on big data, AI, and energy were published in a total of 1235 journals belonging to the SCI-E and SSCI databases during 2001-2020. Table 4 shows the top 20 productive journals, which published 26.6% of all articles. The IEEE Access is the most productive journal, followed by Energies and Applied Energy. Applied Energy publishes less articles than the IEEE Access, but has the highest h-index among these 20 journals.

Most highly co-citation articles
Co-citation analysis is a kind of co-occurrence analysis and refers to the fact that two articles appear together in the references of a third citing article, thus indicating co-citation relationship between two references. Co-citation analysis was used to explore the research foundation with respect to the application of big data and AI in the energy field. The ten most frequently co-cited articles since 2012 are shown in Table 5.
These articles are mainly related to deep learning, big data, energy consumption, and reinforcement learning. Deep learning in AI is a machine-learning technique that uses neural network as a parameter structure for optimization. It has laid a solid theoretical foundation for forecasting problems (LeCun et al. 2015;Schmidhuber 2015), fault detection (He  TP, total number of papers on a discipline; ratio (%): proportion of the number of papers on a discipline; DC degree centrality of a discipline in a co-discipline network; BC, betweenness centrality of a discipline in a co-discipline network  (Silver et al. 2016) in energy systems. The development of AI cannot be separated from the support of big data technology. Big data-driven intelligent energy management places higher demands on IT infrastructure, data collection and sharing, processing and analysis, security, and privacy (Chen et al. 2014a, b;Zhou et al. 2016).
Building energy consumption and data center energy consumption are the focus of energy consumption. Building energy consumption is the main part of energy consumption. Due to its complexity and uncertainty of influencing factors, the review of building energy consumption prediction using the AI algorithms has become the basis of related research (Zhao et al. 2012). With the development of cloud computing technology, the problem of energy consumption in data centers is becoming more and more serious. An architectural framework and principles for energy-efficient cloud computing are defined. Energy-aware resource provisioning and allocation algorithms are presented (Beloglazov et al. 2012).
Reinforcement learning can not only utilize the existing data, but also obtain new data by exploring the environment. The new data are used to repeatedly update and iterate the machine learning algorithm of the existing model. Deep reinforcement learning consists of all reinforcement learning algorithms that use neural network as the parameter structure to be optimized. It provides a theoretical basis for the realization of intelligent energy systems (Mnih et al. 2015).

Research hotspots
Keywords provide a high-level abstraction of the core content of an article. When two keywords appear in the same article with a high frequency, the research direction represented by the keywords is considered a current research hotspot. In the keyword co-occurrence graph, a total of 34 high-frequency keywords were obtained by cleaning and merging keywords, and reserving nodes having word frequencies higher than 20 (Table 6). The titles, keywords, and abstracts usually contain terms that are typical in biomass energy research. The keywords "Biomass Energy" and "Renewable Energy" appeared more frequently than other keywords. By analyzing the highly cited articles with these high frequency keywords, it was found that the application of big data and AI in the field of energy included the following three aspects: smart grid, energy consumption, and renewable energy.

Smart grid
A smart grid is a complicated, interconnected power grid that is one of the important applications of big data and AI technology in the field of energy. The smart gird employs sensors, deployment strategies, smart meters, and real-time data processing to deliver secure energy supply. Research areas relating to smart grid include integrated architecture, security issues, and key-enabling technologies.
The main challenges of smart grids include efficiently managing different types of front-end intelligent devices and processing a huge amount of data received from these devices. The service and technology patterns of cloud computing can provide a computing platform for different types of computing services (Baek et al. 2015, Munshi et al. 2017. Although energy planning can be rendered more efficient, accurate, and intelligent using big data based on cloud computing, different issues involving the privacy of end users and the safe operation of critical infrastructure must be addressed (Hu et al. 2016).
AI will become a core part of the development of smart grid with the support of cloud computing and big data. Based on the big data cloud platform, a deep neural network can forecast the price and energy demand for improving the profitability of service providers and customers (Lu et al. 2019). Clustering algorithms can be applied to household consumption data and help in distinguishing user types for tariff design and switching, fault, and fraud detection (Granell et al. 2015). The analysis of smart meter data (Koutitas et al. 2016) and detection and dealing with missing or inaccurate data (Peppanen et al. 2015) is another research hotspot around smart grid.

Energy consumption
Research on energy consumption mainly includes the forecasting of energy consumption and energy saving. Big data technology can centrally manage a decentralized network data. On this basis, improved machine learning algorithms can accurately predict and reduce energy consumption.
Energy consumption forecasting is mainly divided into methodological oriented research and energy consumptionrelated issue-oriented research. One of the most commonly used methods for energy consumption forecasting is the time series analysis, which uses the characteristics of a certain event in the past to predict its characteristics in the future. Time series forecasting models commonly used in energy consumption forecasting problems include traditional forecasting methods, such as seasonal autoregressive integrated moving average (SARIMA), and deep learning algorithms, such as recurrent neural networks (RNN) and long shortterm memory (LSTM). By integrating a SARIMA model and metaheuristic firefly algorithm-based least squares support vector regression model, a novel time-series sliding window metaheuristic optimization-based machine learning system can use data collected by a smart grid to effectively predict building energy consumption (Chou et al. 2016). Long short-term memory is a kind of time cycling neural network that allows the analysis of ordered data with longterm dependence. An energy consumption forecasting model that employs LSTM and improved sine cosine optimization algorithm shows higher accuracy and stability for building energy consumption forecasting (Somu et al. 2020).
Problem-oriented research analyzes the characteristics of energy consumption-related problems and subsequently designs appropriate forecasting models. An adaptive neurofuzzy reasoning system can be used to predict the wheat yield according to the energy input data (Naderloo et al. 2012). The impacts of bike sharing on energy use are related to the emission of greenhouse gases, and using big data technology to evaluate the impact of energy consumption on the environment is also a topic of academic interest . In many cases, data acquisition is a challenge in the way of using AI for energy demand forecasting. For a limited data set, a novel hybrid dynamic approach that combines a dynamic grey model with genetic programming is more suitable for energy consumption prediction (Lee et al. 2012).
Buildings are key players of end-use energy demand and, therefore, the Internet of Things (IoT) has been used to monitor and control a large variety of energy-related agents in the buildings for energy reduction (Terroso-Saenz et al. 2019). A sensor is an enabler for smart industrial IoT. However, the computation ability and energy consumption of sensor nodes are challenged due to complex intelligent algorithms. Research on edge computing in the energy field has been increasing since 2018. Mobile edge nodes with relatively strong computation and storage ability can provide intelligent trust evaluation for sensor nodes. Therefore, mobile edge computing can help in scheduling of the moving path for the edge nodes to decrease the moving distance and energy consumption . At the same time, edge computing can cooperate with cloud computing to transfer part of intensive computing and storage resources to edge devices for energy consumption optimization (Huang et al. 2020).

Renewable energy
Long-term forecasting of renewable energy sources such as wind, water, and solar has become a research hotspot in academia because these sources have the characteristics of instability. However, the construction and stability of renewable energy generation systems significantly impacts the application of renewable energy. Thus, the optimal configuration and maintenance of generation systems using big data and AI are getting more and more attention.
A data fusion algorithm based on several neural networks can provide long-term wind speed forecast and avoid inefficient and less reliable results (Azad et al. 2014). The support vector machine (SVM) with radial basis function (RBF) as a kernel function can effectively predict global solar radiation (Ramedani et al. 2014).
Suitability analysis of a renewable energy power generation system is the basis of the normal system operation. The first suitability map of potential geothermal sites at a global scale based on a maximum entropy model is presented in (Coro et al. 2020). At the same time, the generation system scale also directly affects the later operation. The simulated annealing and a combination of simulated annealing with harmony search and chaotic search have proved to be effective in determining the most suitable size of autonomous hybrid photovoltaic/wind turbine/fuel cell system for electrification (Maleki et al. 2016).
Energy storage system is an important component of a renewable energy system due to its inherent instability. Research on the application of AI in the energy storage system focuses on remote monitoring and battery maintenance. An echo state network (ESN)-based Q-learning method has been developed for optimal energy management in a solar power supply system with a battery containing a control unit (Shi et al. 2017). For micro power plants, a decentralized cloud system based on general systems and remote telemetry units (RTUs) can be used to gather data for tele-monitoring because only satellite or sparse GSM radio signals are available (Suciu et al. 2016).

Evolutionary trajectory of the research frontier
The burst words in the field of big data, AI, and energy were detected using the keyword burst detection in CiteSpace. These words show a significant change in the number of keyword citations from 2001 to 2020. The burst and decline of the research direction represented by the keywords and the evolutionary path of the research frontier were then identified, which are shown in Fig. 5. The "strength" reflects the burst intensity of each keyword. The terms "begin" and "end" represent the beginning and ending years of the burst of the keywords, respectively, corresponding to the red area in the figure. According to the beginning time of burst words, the research on big data and AI in the energy field can be divided into three stages, which are described in the following sub-sections.

First stage (2001-2012)
The early research frontiers involved the genetic algorithm, AI, and neural networks, which have the longest duration of burst. The genetic algorithm and neural networks belong to the intelligent computing method and the category of AI. The genetic algorithm simulates the process of biological evolution in nature (Mukherjee 2002), where an individual entity evolves until the global optimal value is found. It can effectively solve the highly nonlinear optimization problems that the conventional optimization algorithms cannot solve. These problems include system optimization for improving energy efficiency (Caldas et al. 2002, Chen et al. 2002, and improving the energy system stability (Soliman et al. 2018). In recent years, the genetic algorithm is increasingly being combined with other AI algorithms to solve energy-related optimization problems (Krzywanski 2019;Su et al. 2019).
Compared with the genetic algorithm, neural networks were often used in the prediction problems. One of the main issues was forecasting the potential of renewable energy. Due to the unstable and intermittent nature of wind power, neural networks were often used to predict wind speed in order to avoid inefficient and unreliable results (Mabel et al., 2008, Azad et al. 2014. Neural networks also proved to be an efficient and easy methodology for solar radiation measurement (Bosch et al. 2008;Li et al. 2016). Building energy consumption is an important part of energy consumption. Therefore, neural networks have emerged as a key method to address the nonlinearity of building energy data and obtain robust calculation of large and dynamic data (Yezioro et al. 2008;Platon et al. 2015;Biswas et al. 2016).

Second stage (2013-2017)
The application of big data technology in the energy field has been gaining prominence since 2013. A big data technology system is huge and complex, and related technologies emerging in the second phase include data mining (Chou and Bui 2014;Pan et al. 2015) and analysis, virtualization, and MapReduce. The concept of energy big data refers to the relevant technologies and ideas for comprehensive collection, processing, analysis, and application of data in energy fields, such as electric power (Yu et al. 2016), energy-intensive industries , and renewable energy (Buffat et al. 2018;Sharifzadeh et al. 2019). The most popular research directions on energy big data include smart power grid (Baek et al. 2015;He et al. 2017) and building energy consumption (Plageras et al. 2018) aimed at energy saving.
The collection of energy big data is the basis of data mining and analysis in the energy field. The cloud computing environments and the MapReduce parallel programming model provide solutions for the mining and analysis of massive data (Mashayekhy et al. 2015) (Feller et al. 2015).
Hadoop is an open source implementation of MapReduce (Feller et al. 2015). However, not all enterprises have sufficient resources and ability to deal with the challenges brought by Hadoop deployment. In this case, the introduction of virtualization provides a feasible solution. Virtualization can be utilized to reduce the energy consumption of clouds by using server consolidation without degrading the performance of users applications (Kansal et al. 2015, Shen et al. 2015.

Third stage (2018-2020)
Research on big data and AI in the field of energy has been expanding since 2018. From the technical point of view, SVM, convolutional neural network (CNN), and fog computing are emerging frequently. Support vector machine is one of the most popular machine-learning models. This algorithm performs remarkably for solving the problems in the energy field, including prediction of the heat load in heating systems (Gu et al. 2018), classification of diode rectifier  (Rahnama et al. 2019), and prediction of building energy usage for heating (D'Oca et al. 2015).
As one of the representative algorithms of deep learning, CNN have achieved outstanding results in computer vision, classification, and other fields at the expense of a huge amount of parametric modeling and computations. Therefore, low power and high-efficiency design is a necessary condition for productization when deploying these algorithms in actual applications (Ha et al. 2019;Pang et al. 2020). The CNN have also been applied for classification and prediction in the energy field, including short-term load forecasting of power systems (Bendaoud et al. 2020), prediction of wind speed (Ehsan et al. 2020), and nuclear fault diagnosis system (Yao et al. 2020).
Cloud computing plays a vital role in processing a large amount of data. Compared to cloud computing, fog computing can support delay-sensitive service requests from endusers, which reduces energy consumption and traffic congestion (Fu et al. 2018;Mukherjee et al. 2018).
At present, the research on big data and AI in the field of energy mainly aims to reduce energy consumption. This research can be divided into three categories. First, microgrid is regarded as an important part of the energy Internet in the industry. It has become a strategic choice for countries around the world to realize energy transformation and achieve optimal integration of distributed generation (Mahmoud et al. 2019;Zhou et al. 2019). Second, building energy consumption contributes the most to energy consumption. Smart buildings integrate AI technologies with energy consumption management systems, which can simultaneously manage the energy consumption effectively and also increase the user comfort Wahid et al. 2019). Third, with the development of the IoT, the sensor industry will also undergo significant expansion, further promoting the application of self-powered systems in various industries (Chen et al. 2019;Zhong et al. 2019).

Conclusion
The characteristics of big data, AI, and energy-related literature from 2001 to 2020 were examined based on the SCI-E and SSCI databases using bibliometric methods. This paper revealed that the literature in this field had become more extensive over the past 20 years. The pace of publishing in this field increased rapidly, especially starting from 2014.
Based on the country analysis, China was shown to be one of the most important contributors to the literature on big data, AI, and energy with the highest number of publications (1115), followed by the USA and the UK. However, the USA had the highest h-index. The institution analysis showed that the Chinese Acad. Sci. made the most significant contributions in the research field. Similarly, it played a vital role in the collaborative network of 30 productive institutions. It developed strong cooperation relationships with other institutes, especially with the Univ. Chinese Acad. Sci. in China and Georgia Inst. Techno. in the USA.
The study also revealed that engineering was the most popular discipline with the highest number of publications on big data, AI, and energy. Furthermore, 20 core journals contributed about 26.6% of the total number of journal publications, with a significantly higher number of papers published in the "IEEE Access" and "Energies" journals compared to other journals.
The results of co-citation analysis showed that deep learning, deep reinforcement learning, energy big data, and prediction of energy consumption had laid a solid theoretical foundation in this field. Keyword co-occurrence analysis showed a popular research topic that the application of big data and AI on energy focused on smart grid, energy consumption, and renewable energy. The burst detection of keywords revealed that the research evolved through three stages. Early research frontiers involved optimization and prediction of energy-related problems solved using the genetic algorithm and neural networks. Energy big data involving data mining, virtualization, and MapReduce gained prominence since 2013. At present, machine learning, deep learning, and fog computing are frequently combined with energy, including microgrid, smart building, and self-powered systems. These studies will lay a solid foundation for application of big data and AI to renewable energy and energy-saving renovation of buildings in the future.
The findings presented in this paper can help scholars understand the developmental trends for big data, AI, and energy, and correctly grasp the research direction and method of the emerging interdisciplinary field with many cross-industry professional concepts. They can also assist energy enterprises in finding the right direction for technology investment.
Author contribution Yali Hou contributed to data curation, formal analysis, writing-original draft. Qunwei Wang contributed to conceptualization, methodology.
Funding This study is grateful for financial support provided by the National Natural Science Foundation of China (grant numbers: 52270183).

Data availability
The datasets used and analyzed during the current study can be provided on reasonable request.

Declarations
Ethical approval The authors declare that the work described was original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part.

Consent to participate
There are no individual participants included in the study.

Consent for publication
The manuscript is approved by all authors for publication.

Competing interests
The authors declare no competing interests.