Probing the carbon emissions in 30 regions of China based on symbolic regression and Tapio decoupling

Against the background of energy shortages and severe air pollution, countries around the world are aware of the importance of energy conservation and emission reduction; China is actively achieving emission reduction targets. In this study, we use a symbolic regression to classify China’s regions according to the degree of influencing factors and calculate and analyze the inherent decoupling relationship between carbon emissions and economic growth in each region. Based on our results, we divided the 30 regions of the country into six categories according to the main influencing factors: GDP (13 regions), energy intensity (EI; 7 regions), industrial structure (IS; 3 regions), urbanization rate (UR; 3 regions), car ownership (CO; 2 regions), and household consumption level (HCL; 2 regions). Then, according to the order of the average carbon emissions in each region from high to low, these regions were further categorized as Type-EI, Type-UR, Type-GDP, Type-IS, Type-CO, or Type-HCL regions. The decoupling coefficient of the Type-UR region was the smallest with an expansive coupling and weak decoupling, whereas the other regions showed expansive negative decoupling, expansive coupling, and weak decoupling. Among them, the reduction rate of the decoupling coefficient in the Type-EI region was the largest at 6.65%. EI and GDP regions were the most notable contributors to emissions, based on which we provide policy recommendations.


Introduction
Since the Industrial Revolution of the nineteenth century, the emission of greenhouse gases, such as carbon dioxide, generated by human activities has increased sharply, exceeding the regulatory capacity of nature, which has led to an increase in global average temperatures and global warming Schandl et al. 2016). Carbon emissions are considered the main cause of global warming, such that reducing carbon emissions is a fundamental task for environmental protection and human survival and development (Shi et al. 2017). Various countries have actively been formulating schemes to control carbon dioxide emissions. Freitas and Kaneko (2011) utilized this method to study the occurrence of a decoupling between economic growth and energy-related CO2 emission from 2004 to 2009 in Brazil. Bellocchi et al. (2018) evaluate the integration of electric vehicles in the Italian energy scenario and their synergy with electricity generation from renewable energy sources, identifying the impacts in terms of CO2 emissions, costs, and curtailments on a medium-long-term perspective.
Since the reform and opening up of China, the Chinese economy has maintained sustained high growth Quan et al. 2020). Since the 1990s, China has accelerated the urbanization process, with remarkable results (Li and Yao 2009). The GDP rose from 6,133.9 billion yuan in 1995 to 74 006.1 billion yuan in 2016, with an average annual increase of 12.59%. The rapid development of the GDP has caused serious damage to the living environment, accompanied by substantial energy consumption and carbon dioxide emissions. At the same time, environmental security is threatening the normal life of human beings. Therefore, the problem of air pollution has recently garnered significant attention, expounding the urgency for a solution, which is a result of the rapid development of the economy, population growth, and the continuous increase in energy consumption. Countries must find solutions that allow economic development to be dependent on clean energy, thus reducing the use of fossil fuels.
The development of a low-carbon economy is an inevitable strategy for a country to achieve sustainable development. A low-carbon economy refers to reducing the consumption of high-carbon energy (e.g., coal) as much as possible, thereby reducing carbon dioxide emissions, through technological innovation and transformation of the industrial structure in the form of economic development (Mohsin et al. 2019). To achieve emission reduction targets, the Chinese government has formulated detailed policy plans. At the 2009 climate change conference in Copenhagen, the Chinese government promised to reduce its carbon intensity by 40-45% by 2020 as compared with 2005. Later, in response to the Paris Agreement, which was enacted in 2015, China committed itself to reducing its carbon intensity by 60-65% by 2030 as compared with 2005, where the total carbon emissions are expected to peak by 2030 . To accomplish these plans, the state distributes emission reduction targets to provinces and cities to fundamentally achieve these emission reduction plans . However, there are numerous differences among China's regions in terms of geographical conditions, resource endowments, and customs, and there is no unified emission reduction plan. Therefore, the government should formulate a corresponding low-carbon economic strategy based on the details of each region (Yang et al. 2018). To reasonably achieve these goals, the relationship between carbon emissions and various influencing factors must be understood in different provinces to develop effective emission reduction measures.
The novelty of this study is three-fold. First, there is a lack of literature on the use of symbolic regression to analyze China's carbon emissions. We aim to address these deficiencies. Second, we use the advantages of symbolic regression to analyze the factors influencing China's carbon emissions. Based on previous studies, to more accurately analyze the detailed situation for China's carbon emissions, we introduce factors, such as the urbanization rate, household consumption level, and car ownership. Third, we combine symbolic regression and Tapio decoupling methods to analyze in detail the internal links between China's various regions and economic growth.
This study has some limitations. First, owing to issues of data availability and the actual situation of emission reduction in various provinces, the development of emission reduction technology and government financial investment have not been considered. The future research direction is to deeply analyze the relationship between the influencing factors of city-level carbon emissions and establish a city-level carbon emission assessment model in China.
The remainder of this paper is organized as follows. The "Literature review" section introduces the current research status on carbon emissions, symbol regression, and Tapio decoupling. The "Materials and methods" section describes the methodological theory and data sources. In the "Results and discussion" section, we present the results of using the symbolic regression method to establish a carbon emissions model, classify the Chinese regions according to their main influencing factors, and then use the Tapio model for decoupling analysis. The "Conclusions" section presents the conclusions and policy recommendations.

Literature review
Many experts pay close attention to economic growth and the relationship between energy consumption and environmental protection. From 1996 to 2013, China's carbon emissions increased by 227.43%. The impact of economic growth mainly promoted an increase in carbon emissions, such that the key to reducing the energy intensity is to reduce carbon emissions. Changes in the energy structure also have a slight impact on the total emission growth (Jiang et al. 2017). The development of urbanization has a positive effect on China's energy consumption and carbon dioxide emissions, which differ based on regional characteristics. Urbanization has a greater impact on CO 2 emissions in central China than in eastern China (Liu et al. 2011;Zhang and Lin 2012;Li and Zhou 2019). Shahbaz et al. (2016) found that economic growth mainly affected the carbon dioxide emissions in Malaysia and that urbanization can reduce carbon emissions, whereas growth to a certain level promoted carbon dioxide emissions. For China, scholars have found that wealth and population effects are the first two factors in the growth of CO 2 emissions in Xinjiang. The energy intensity effect has become the dominant factor in curbing carbon emissions (Wang et al. 2015). The population, urbanization level, per capita GDP, industrialization level, and service level were the main reasons for the growth of CO 2 emissions in Guangdong Province (Wang et al. 2013). Other studies have found that technological change is the dominant factor in the decoupling of environmental pressures from economic growth in Chongqing from 1999 to 2000 while economic structural changes had a negative impact on carbon dioxide emissions (Yu et al. 2017).
Population growth and the regional per capita GDP contributed to CO 2 emissions in the Beijing-Tianjin-Hebei region while effects on end-use structural changes remained unchanged for the CO 2 emissions in Beijing and Hebei but has a greater impact on Tianjin's carbon emissions (Fan et al. 2019). Pan et al. (2019) found that the GDP, industrialization, technological innovation, urbanization, population, and foreign direct investments were the most common factors for carbon intensity in 34 OECD countries based on the symbolic regression method. Wen et al. (2018) found that the existence of the M-curve model between the per capita CO 2 emissions and per capita GDP, and total energy consumption, was consistent with the traditional model of the Environmental Kuznets Curve (EKC); the L-curve model between energy intensity and per capita GDP performed well. Lin and Benjamin (2019) conducted a quantile analysis of Shanghai's industrial carbon emissions, finding that urbanization had a significant impact on CO 2 emissions, i.e., the leading factor for increasing carbon emissions, followed by the energy structure, industrial structure, economic growth, and energy efficiency. Enhancing the energy efficiency is an effective technique to reduce CO 2 emissions and energy structures. The economic output, R&D intensity, investment intensity, and energy structure of the industrial sector in Henan Province were the driving factors for the increase in CO 2 emissions from 2001 to 2015. In contrast, energy intensity, R&D efficiency, and the internal industrial structure can reduce CO 2 emissions . Production in the secondary industry has proven to be a major source of carbon emissions, with a relatively high emission reduction potential (Zhang and Da 2015). Lee and Oh (2006) explore the differences in CO 2 emissions in APEC regions. Their main finding was that the per capita GDP and total population growth were the main factors contributing to the promotion of CO 2 emissions. Moreover, the energy efficiency and fuel conversion have been considered the most promising areas by many experts (Kihoon and Wankeun 2006).
Symbolic regression is based on evolutionary computation, also known as function modeling (Dong and Hao 2018). All variable values required for a particular objective function are transformed into a functional relationship using this method. Schmidt and Lipson (2009) found that symbolic regression can actively search for process data and find the Hamiltonian, Lagrangian, and other laws of geometric and momentum conservation. In contrast to other fitting methods, symbolic regression can find the relationships between invisible functions (Khu et al. 2001). Symbol regression sets parameters and symbols at the same time and can be considered a novel and efficient method. Wen et al. (2017) probed the major factors affecting carbon emissions via symbolic regression, indicating that Beijing and Tianjin had different carbon emission targets. Yang et al. (2015a, b) used symbolic regression to probe the relationship between the GDP and carbon emissions using a straightforward regression function. On this basis, to more accurately fit the carbon emission formulas of various regions in China, they added more arithmetic symbols, such as sine, cosine, division, exponential, and power. Yang et al. (2015a, b) used EKC theory and symbolic regression to analyze data from 67 countries and obtained four models of carbon emissions and economic growth. In this study, we used symbolic regression to fit multiple equations from a new perspective. The degree of impact of carbon emissions was selected to screen out the influencing factors, and the Chinese regions were classified according to the largest influencing factors so as to better formulate energy conservation and emission reduction strategies in accordance with local conditions. Decoupling distinguishes the relationship between economic gain and environmental stress, as defined by the OECD. Decoupling in a low-carbon economy refers to the following relationship: as the economy grows, carbon emissions begin to increase, but as the economy continues to grow, carbon emissions will decrease or even disappear. In fact, reducing energy consumption for economic growth is necessary. Therefore, the decoupling of carbon emissions refers to the economic growth elasticity of carbon emissions. Luo et al. (2020) used the Tapio model to study the decoupling relationship between economic growth and resources in the Central Plains urban agglomeration, proposing relevant policies for a strong decoupling. Xie et al. (2020) used the Tapio decoupling model to analyze the decoupling index of CO 2 emissions in the power industry, thereby realizing energy conservation and emission reduction in this industry. Zhang et al. (2020) analyzed the decoupling relationship between economic output and carbon emissions in 11 provinces in the Yangtze River Basin (YREB), finding that the energy intensity (EI) effect was the main driving force for the decoupling of most provinces. Based on previous research, the Tapio decoupling method can analyze the elasticity state between economic growth and carbon emissions in detail.

Details of carbon emissions calculation
The calculations were based on the carbon emission measurements presented by the IPCC Guidelines for National Greenhouse Gas Inventories (IPCC 2006). Carbon emissions can be calculated as follows: where C denotes the total carbon emissions, C i indicates the carbon emissions of the ith energy type, E i is the consumption of the ith energy type (10 4 t, 108 m 3 ), e i denotes the standard coal coefficient of the ith energy type (10 4 tce/t, 104 tce/108 m 3 ), and f i is the carbon emissions coefficient of the ith fuel type (10 4 tcec/10 4 tce).

Symbolic regression model
Symbolic regression is a functional regression method that improves the genetic algorithm and can automatically search for the functional structure of a particular initial population (O'Reilly 2014; Bartosz and Jarosław 2016). Compared with the general clustering method, symbol regression has the advantage of avoiding a decrease in the choice range using a prehypothesis model and decreases the probability of losing the potential model. Symbol regression can automatically establish relationships based on the internal properties of the data, similar to a robotic scientist. It finds the functional model with the highest fitness to show this relationship and determines the parameters and structure of each regression model (Vladislavleva et al. 2009;Khu et al. 2001). In addition, the Eureqa software combines the advantages of symbol regression methods with simplicity and intelligence, such that many studies have used it in various application areas (Can and Heavy 2011;Yang et al. 2016). The core of symbolic regression is based on Darwin's theory of evolution, which selects important factors to gradually form a model and automatically separate non-existing factors from the model. This theory is based on the principle of genetic programming, the process of which is illustrated in Fig. 1.
When a better-quality impact factor appears in the selected system, it will be repeatedly selected, replicated, and reproduced in the system until the end of the iteration. Therefore, whether or not an influencing factor considered in this study is retained in the model depends on its fitness to the entire model. Consequently, only the important factors will be selected to gradually form the model, and non-important factors will be automatically removed from the model. The frequency of appearance of an appropriate influencing factor (i.e., the most important factor for the model) can be used to determine the factors that have a significant impact on the selection function of symbol regression. In other words, the appearance of each factor represents its relative importance to the data, in which a higher frequency indicates that a factor is more important to the model (Yang et al. 2016). This approach helps researchers determine the contribution of each factor to a model. In other words, the presence of factors indicates existence while the frequency of occurrence indicates importance.
The Eureqa formula was used to solve the problem of modeling complex data using symbolic regression. Eureqa is a scientific data mining software package that solves most of the computationally heavy workload inherent in automated scientific processing. Symbolic regression has the advantage of rapidly and effectively finding models and parameters with high accuracy. In general, more complex candidate solutions may be more accurate, but the possibility of overfitting also increases. A simple and effective technique to control overfitting is to limit the complexity of the model. In this study, the complexity statistic (see below) was used to represent the complexity of the candidate solution. In the tree structure, each node expresses the complexity of the model by measuring the total number of nodes in the syntax tree, rendering the model more complex, as shown in Fig. 2.
Symbolic regression does not need to make assumptions in advance; rather, it can fit a series of candidate models and their parameters based on input and output. The complexity (C), Rsquared (R 2 ), and fitness function (Fit) are often used to evaluate the advantages and disadvantages of the candidate models. Complexity (C) represents the complexity of the candidate model. In this study, the fitting accuracy was measured by the fitness measure and R-squared (R 2 ), calculated as in Eq.
(2); the larger the R 2 value, the higher the fitting accuracy of the equation: Fitness was used to evaluate personal strengths, retain high fitness individuals, and remove low-fitness individuals, calculated as in Eq. (3), where Y i is the actual value of the dependent variable, b y indicates the predictive value, and y is the average value of the dependent variable:

Tapio decoupling model
This study adopts the Tapio decoupling model, which further divides the degree of decoupling on the basis of the OECD model, which is divided into eight categories. After a detailed division, the internal relationship between the environmental pressure and economic indicators can be evaluated in more detail. Based on Tapio's decoupling methodology, this study built a decoupling framework between carbon emissions (C) and economic growth (G), which seeks the relationship between the decoupling coefficient and decoupling model: where D t is the decoupling exponent, %C is the percentage of carbon emissions, and %G is the percentage of various influencing factors. Hence, on the basis of the definition in Tapio (2005), eight decoupling states are expressed in Table 1.
A unique feature of the symbolic regression method is that the most important and common factors are selected. Therefore, applying this method to classify 30 regions in China can identify the importance of the influencing factors for each type of region with the clustering method. The symbolic regression method selects the most superior influencing factors with a distinct approach. In addition, the Tapio decoupling method is often used to calculate the relationship between environmental pressure and economic growth in a region. In this study, we combined the advantages of these two methods and divided the regions with the same characteristics into a single category. Based on calculation of the decoupling index between environmental pressure and economic growth, a more accurate analysis of China's carbon emissions could be obtained to ultimately formulate accurate and reasonable energysaving and emission-reduction policies for the 30 regions in China.

Data description
We selected a variety of variables as indicators that affect environmental pollution and screened the main factors according to the actual situation in each province. The data resources covered in this studies derive from the China Statistical Yearbook (CSY) and the statistical yearbooks of the 30 investigated regions from 1995 to 2016. The energy unit was the standard coal consumption in 10 4 tce. Eight energy sources were employed in this study, including coal, coke, crude oil, gasoline, kerosene, diesel, fuel oil, and natural gas. Table 2 lists the carbon emission coefficients of each of these energy types. In addition to the Tibet Autonomous Region, Hong Kong, Macao, and Taiwan were also included.
For the sake of completeness with respect to the influencing factors on carbon emissions in the country and provinces, we considered six independent variables, i.e., the GDP, EI, UR, HCL, IS, and CO. CO includes the number of civilian passenger cars, the number of civilian trucks and private passenger cars, and the number of private trucks. A detailed description of each variable is provided in Table 3. As China implements the family planning policy, the population factor changes little, and the total population and urbanization have a collinear relationship. Therefore, we did not use population factors.
Technological progress also plays an important role in reducing carbon emissions. Because technological progress factors cannot be measured quantitatively, the energy intensity, which is the ratio of terminal energy consumption to economic growth, can most directly reflect the effect of technological progress. That is, by introducing new energy-saving and emission-reduction technologies, energy consumption and production costs are reduced, thereby reducing energy intensity. Some studies have used research and development (R&D) expenditure to reflect the level of technology, showing that practical energy technology is more effective for energysaving methods (Huang et al. 2021a) and that practical R&D activities are more helpful in reducing carbon intensity than invention-based R&D in practice (Huang et al. 2021b). Based  on previous studies, the variables household consumption level (HCL) and car ownership (CO) were newly introduced in our model. With rapid development of the economy, people's living standards have gradually improved, and their consumption capacity will increase accordingly. Consequently, more energy-intensive equipment will be purchased. HCL can explain the increase in carbon emissions caused by the increase in consumption capacity. Moreover, to facilitate travel more conveniently and quickly, the huge increase in CO has caused a series of problems such as increased fuel consumption, air pollution emissions, and serious traffic congestion Yang and Zhou, 2020;Lian et al. 2018). The results of the unit root test and co-integration test of data from Chongqing city are presented in Tables 4 and 5, respectively.

Symbolic regression
We used symbolic regression to cluster 30 provinces, cities, and regions in China based on the influencing factors. We calculated the carbon emissions in these 30 regions from 1995 to 2016 using the carbon emissions calculation formula presented above. The research data in Fig. 3 indicate that China's carbon emissions showed an increasing trend, starting with an average annual growth rate of 5.63%. Since 2014, under the influence of the national energy conservation and emission reduction policy, the nation's carbon emissions have shown an increasing trend, with an average annual reduction rate of 1.28%. Overall, the trend in China's carbon emissions from 1995 to 2016 was variable. The direct reason for this is the effect of the carbon emissions of various regions. The relationship between the carbon emissions in each region is analyzed below in detail. We used the Eureqa software to fit a complex formula for carbon emissions in each province. When using the symbol regression method, common operators were selected in the model: constants, input variables, + (addition), -(subtraction), * (multiplication), / (division), sin (sin), cos (cosine), division, exp (exponential), and power. We further explored the factors that appeared in the best models. We used Chongqing as an example to demonstrate our symbolic regression approach. The search process for factors in the other 30 regions was consistent with that in Chongqing.
Based on a large number of alternative models, we established a Pareto front. The most suitable model is generally assumed to be located at the front of the Pareto, which is suitable for simultaneously balancing the fitting accuracy and model complexity (Jin and Bernhard 2008). After the Pareto front had been accomplished, we focused on the Pareto front model. For researching pivotal factors and models, symbol regression was repeated. The Pareto front for Chongqing is shown in Fig. 4, and the model that suited the results best was chosen. Moreover, the best model that often appears at the Pareto front may be considered more likely to be associated    Table 6 summarizes the results of the symbolic regression for Chongqing. Figure 5 shows the relative importance of various influencing factors derived from the symbolic regression. For example, a total of nine equations were fitted for the Pareto boundary in Chongqing; among these, GDP appeared in six models, whereas CO appeared in only one model. Figure 6 shows the number of models in which each variable appeared. The most common variable was GDP, which appeared 12 times in all models, whereas CO appeared only once. Combining Figs. 5 and 6 confirms that GDP is the most frequent factor in all equations, suggesting that GDP has the greatest impact on Chongqing's carbon emissions. The relative importance of the other factors can then be determined in the same way.
Pareto fronts were obtained for every province, and the influential factors on carbon emissions were probed in each of these provinces. We analyzed the occurrence of each influencing factor in the Pareto optimal models and classified the influencing factors. The classifications were A, B, C, D, E, F, and 0. A to F represent the number of occurrences from higher to lower, and 0 is an unrelated factor. "-" indicates that no influencing factors appear in the respective classification. Table 7 describes the classification of each region according to the number of occurrences of influencing factors, and the R 2 values correspond to the optimal model. To classify the performance of each influencing factor in classification A, we divided the 30 regions into six categories, classified by the importance of each influencing factor: Type-GDP, Type-EI, Type-CO, Type-IS, Type-HCL, and Type-UR. Type-GDP included Beijing, Inner Mongolia, Heilongjiang, Jiangsu, Henan, Shaanxi, Qinghai, Ningxia, Chongqing, Shanghai, Guangxi, Yunnan, and Xinjiang. Type-EI included Liaoning, Shandong, Hainan, Hebei, Hubei, Sichuan, and Shanxi. This is consistent with the conclusions of Song et al. (2020) who stated that in cities in the Bohai Rim Economic Circle, the energy intensity effect had the greatest impact on the carbon emission intensity. Type-CO included Tianjin, Anhui, and Hunan. Type-IS included Jiangxi, Zhejiang, and Jilin. Type-HCL included Fujian and Guizhou, and Type-UR included Guangdong and Gansu.
This classification is illustrated in Fig. 7, where 0 represents the regions not considered (Tibet Autonomous Region, Hong Kong, Macao, and Taiwan),"1" represents Type-GDP regions, "2" represents Type-EI regions, "3" represents Type-CO regions, "4" represents Type-IS regions, "5" represents Type-HCL regions, and "6" represents Type-UR regions. Table 8 lists the number of occurrences of each impact factor in the optimal model. The GDP was the most important factor for carbon emissions because it appeared as the primary factor in 13 regions in category A. Liu et al. (2012) reported that among these regions, Xinjiang and Inner Mongolia, with the potential for economic development, are the most important drivers of carbon emissions. The EI was the second most important factor affecting carbon emissions in the seven regions. However, in category B, the EI appeared 16 times while the GDP only appeared nine times, such that the EI is the extent of the impact of carbon emissions in second place. CO and IS together occupied the third position. The HCL and UR played an important role in two regions.  found that economic growth, population growth, urbanization rate, fixed asset investment, and industrialization had a positive impact on CO 2 emissions in Guangdong Province while the impact of the energy consumption structure and technological progress was negative. Finally, the HCL most frequently occurred in category 0, indicating that this was least influential.
The average value of each variable in each category is listed in Table 9. Type-GDP regions had the highest average HCL (9,193.003) and UR (0.494) from 1995 to 2016. Type-EI regions had the highest average carbon emissions and highest EIs of 11750.820 and 1.989, respectively. We note that such areas should be the focus for management. A high energy consumption generates significant CO 2 emissions while an effective reduction in the EI is key to a reduction in the CO 2 emissions. Type-IS regions had the highest percentage of IS. The Type-IS region represents areas with mainly energyintensive industries. Carbon emissions can be reduced by reducing the proportion of heavy industries, adjusting the industrial structure, improving traditional land fiscal policies, and attracting investment from the tertiary industry (Wang et al. 2020). Type-UR had the highest average GDP (17559.020) and CO (575.868). In these areas, urban development drives residents to buy more cars, which is also the reason for the increasing carbon emissions. Figure 8 shows the detailed distribution of carbon emissions in the six regions from 1995 to 2016. Figure 9 depicts the average carbon emissions in each type of region from 1995 to 2016. The annual average growth rates of Type-GDP, Type-EI, Type-CO, Type-IS, Type-HCL, and Type-UR were 6.26, 5.23, 4.74, 5.10, 6.91, and 5.44 %, respectively. The average carbon emissions in each region showed a steady development trend since 2011. The main reason for this phenomenon is likely the government's effective implementation of emission reduction policies. According to the average carbon emissions of each region, the order of the average carbon emissions from high to low was Type-EI, Type-UR, Type-GDP, Type-IS, Type-CO, and Type-HCL.

Decoupling analysis
Based on the cluster classification results in the previous section, the decoupling index between the carbon emissions and economic growth in the six regions of the country was built, which analyzed the internal relationship between the carbon emissions and economic growth, based on which we proposed appropriate energy-saving and emission reduction measures for each region. Figure 10 indicates that the decoupling relationship between the carbon emissions and economic growth in the six types of regions had a downward trend. Except for Type-EI, the other five types of regions entered an expansive coupling and weak decoupling state since the enactment of the 10th Five-Year Plan (2001-2005, indicating that since then the country has increased its efforts in energy conservation and emission reduction measures. The decoupling coefficient of Type-GDP was smaller than Type-EI regions and showed very weak decoupling in 2000 because the economic scale of the Type-GDP region is affected by geographical location, investment, and national policies. Thus, the rate of energy consumption is much smaller than the growth rate of economic development. Wu et al. (2019) found that economic output is the main factor contributing to China's carbon emissions, which is consistent with the results of the present study. However, while the decoupling index of EI-based regions decreased, the decoupling index entered a weak decoupling state after 2012. The economic growth rate was slightly higher than the growth rate of carbon emissions, indicating that the regions were gradually changing their manners of promoting economic growth at the expense of energy consumption, gradually achieving the goals of energy conservation and emission reduction.
Moreover, the decoupling coefficient represented by Type-EI had the fastest and largest decline from 1995 to 2016 (Fig.  10). This was mainly because energy is the main source of CO 2 , which is based on the EI. The main areas belonging to Type-EI were Liaoning, Hebei, Shanxi, Shandong, Hubei, and other high-energy energy areas, which are characterized by high pollution and energy consumption. Industrial development promotes development of the regional economy; accordingly, the negative impact of energy intensity on changes in carbon dioxide emissions depends on the reduction of industrial energy intensity. This is consistent with the study of Wang and Jiang (2019). Therefore, the key to reducing carbon emissions is to reduce the industrial energy intensity Hao et al. 2019). Emission measures have impacted regions dominated by EI. At the same time, the decoupling index of Type-EI for carbon emissions and economic growth showed the largest decline of 6.65%; therefore, this is the most effective technique to realize energy savings and emission reduction by controlling the EI. Overall, the decoupling index of each region showed a relatively flat trend over time, which represents a weak decoupling state. The reason for such a weak decoupling may be that under the country's strong governmental macro-control, each region is actively achieving government calls for energy conservation and emission reductions by adopting a series of emission reduction measures and promoting economic growth. Interestingly, the HCL regions were weakly decoupled after 2008. Fujian and Guizhou experienced a decline in volatility. The average annual growth rate of HCL in Fujian before 1995-2001 was 7.92%; from 2001 to 2004, the average annual growth rate was 8.80%. From 1995 to 2001, the carbon emission growth rate of the HCL regions was 4.37%, and the GDP growth rate was 11.35%, whereas the carbon emission growth rate from 2001 to 2004 was 16.99%, and the GDP growth rate was 12.64%. Cities in the HCL regions did not strictly implement the relevant national energy-saving and emissions reduction policies during the "Tenth Five-Year Plan" period, resulting in the rapid growth of carbon emissions.
In the IS regions, the decoupling coefficient of carbon emissions and the GDP showed a gradual decrease, indicating that these regions had actively been achieving national energy-saving and emission reduction targets, resulting in a weak decoupling between carbon emissions and the GDP in 2010.The main factor contributing to carbon emissions in Jilin and Jiangxi was found to be the industrial structure, followed by EI, GDP, and UR, indicating that the industrial structure in the Type-IS region is unreasonable. Jiangxi has developed non-ferrous metal smelting and porcelain production industries. Most of the energy enterprises in Jilin Province have difficulties in enterprise reform and show lack of vitality in economic development (Zheng et al. 2019). In regions where CO was the main driving force, the decoupling index of carbon emissions and economic growth tended to be flat. The government encourages the purchase of new energy vehicles to achieve energy conservation and emissions reduction goals.
In UR regions, such as Guangdong and Gansu, the decoupling relationship between carbon emissions and economic growth has been weakly decoupled since 2000. However, the average carbon emissions of the UR region were 75.83873 million tons, ranking second among the six regions, while the economic growth was highest, reaching 17559.02 ten thousand yuan, representing rapid economic growth at the cost of high energy consumption. Since 1995, the elasticity between carbon emissions and economic growth in the UR region has been the smallest. Its decoupling coefficient is 1.15, an average annual decrease of 6.87%.
If China's "extensive" development method does not undergo a fundamental change, its development process will not be able to effectively decouple from carbon emissions. Considering the coal industry's declining trend and the continued decrease in coal prices, coal may be the primary choice for energy generation in some industries, such that the energy consumption structure dominated by coal is difficult to change through market choices in the short term.

Conclusions
To achieve green economic development, the Chinese government must continue to coordinate the relationship between  economic growth and environmental protection in seeking a pathway to develop a low-carbon economy. In this study, we used symbolic regression, automatic identification, and search methods to explore the factors affecting carbon emissions, including the GDP, EI, UR, IS, HCL, and CO, in 30 regions in China, using 22-year data resources from 1995 to 2016. First, we found that the factors that most strongly affect carbon emissions vary from region to region. According to the symbolic regression method, the factors that have the greatest impact on carbon emissions were selected. We then divided the 30 regions of the country into six regions. The average carbon emissions in each region from high to low were in the order Type-EI, Type-UR, Type-GDP, Type-IS, Type-CO, and Type-HCL, with an annual average growth rate of carbon emissions of 5.23%, 5.44%, 6.26%, 5.10%, 4.74%, and 6.91%, respectively.
Second, the clustering results were obtained based on analysis of the factors affecting carbon emissions in each region, including the main influencing factors (number of regions): Type-GDP (13), Type-EI (7), Type-CO (3), Type-IS (3), Type-UR (2), and Type-HCL (2). The average carbon emissions and energy intensity of the Type-EI region were the highest, at 117.50820 million tons and 1.989, respectively. Cities in this region type include those in Liaoning, Shandong, Hainan, Hebei, Hubei, Sichuan, and Shanxi, which are all energy-consuming provinces that have developed industries, and are characterized by high pollution and high emissions. China as a whole has reached the peak of carbon and has achieved carbon neutrality. Therefore, it is now important to focus on energy conservation and emission reduction in the EI region to effectively realize the national emission reduction task.
Third, the decoupling coefficient between carbon emissions and economic growth in the six region types has shown a downward trend and has gradually stabilized, indicating that the country has received substantial policy guidance in achieving carbon peaks and carbon neutrality, and has achieved great results in these respects. The decoupling coefficient of the Type-UR region was found to be the smallest, and the decoupling state is expansion coupling with weak decoupling, whereas the decoupling states in other regions are negative expansion decoupling, expansion coupling, and weak decoupling. Among them, the reduction rate of the decoupling  Type-GDP Type-EI Type-CO Type-IS Type-HCL Type-UR Fig. 9 Average carbon emissions in each category of region coefficient in the Type-EI region was found to be the largest at 6.65%.

Policy proposals
As the influencing factors had a significantly differing impact on the carbon emissions in the different regions, we proposed a combination of the regional characteristics and resource endowments to develop differentiated carbon emission reduction strategies to promote the development of a low-carbon economy. The following suggestions can be made based on our results and analysis. First, differentiated regional emission reduction targets based on local conditions should be developed. Therefore, the setting of emission reduction targets and the formulation of policies should be determined according to the actual conditions of each region and not be unified. The focus should be on areas with high carbon emissions, such as Type-EI areas, while areas with moderate carbon emission levels should be regularly monitored, such as Type-GDP and Type-UR areas. Areas with low carbon emissions can focus on economic development, such as Type-CO and Type-HCL areas.
Second, technological innovation should be encouraged, with the promotion of energy-savings and emission reduction technologies. The government should increase investments in advanced energy-saving technologies, such that new technologies can be applied to production in a timely and effective manner, thereby reducing unit carbon emissions, making full use of high-tech and advanced environmental protection technologies to upgrade and optimize traditional industries. The research and development of new energy sources, such as wind and solar energy, have improved the utilization efficiency of new energy sources and reduced the use of fossil energy, thus encouraging a low-carbon economy. The government should therefore encourage companies to use new energy sources and provide preferential policies.
Third, low-carbon industries should actively be developed, industrial structures optimized, and a low-carbon economy developed. Examples include Jiangxi, Zhejiang, and Jilin. First, for traditional high-energy-consuming industries, we must not only eliminate the backward production capacity, but also carry out low-carbon environmental protection transformation and upgrades. For example, through technology upgrades, the use of clean and renewable energy can be encouraged; high-carbon emission industries should be supervised and managed, energy-saving emission reduction tasks determined, and low-carbon economic development achieved. Moreover, new industrialization and environmentally friendly developments of advanced manufacturing and hightechnology industries through preferential policies from the government should be achieved. Finally, the government should actively promote the development of tertiary industries and the economy, thereby reducing carbon pollution during the urbanization process.
Fourth, during urbanization, the consumption levels of residents should be reasonably controlled. High-level consumption leads to an increase in carbon emissions. To raise resident awareness of the benefits of low carbon emissions, we recommend that the government introduces relevant policies to increase low-carbon environmental protection projects and rationally allocate resident spending power, particularly in Fujian and Guizhou. The government should also encourage urban residents to start businesses, continuously improve their independent innovation capabilities, and promote the regional GDP. At the same time, especially in Guangdong and Gansu, the urban population growth rate has significantly increased, which has promoted the large-scale migration of rural populations to cities and reduced regional carbon emissions.
Fifth, although CO was not the most important factor, it was still relevant in some regions, indicating that this factor cannot be ignored. The government should encourage the development of environmentally friendly and energy-saving power batteries. For example, the active exploration of new electric vehicles based on hydrogen fuel cell technology is expected to achieve zero emissions. In contrast, the government should vigorously promote the implementation of electric buses to gradually replace traditional fuel buses and encourage residents to buy new electric vehicles to achieve maximum carbon emission reductions.
Author contribution All authors contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Haiying Liu and Zhiqun Zhang. The first draft of the manuscript was written by Zhiqun Zhang, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Data availability The datasets generated and/or analyzed during the current study are not publicly available due (part of the data comes from research) but are available from the corresponding author on reasonable request.

Declarations
Ethics approval Not applicable.
Consent to participate Not applicable.

Consent for publication Not applicable.
Competing interests The authors declare no competing interests.