Examining energy inequality under the rapid residential energy transition in China through household surveys

Since 2013, China has initiated a rapid energy transition that replaces traditional solid fuels with modern clean energy. Despite the tremendous success of the energy transition, its impacts on household energy costs and associated energy inequality remain largely unexplored. Here we use data from a large nationwide household survey to investigate these trends. We find that about two-fifths (43.0%) of surveyed households switched from traditional solid fuels to clean energy during 2013–2017. However, 56.1% to ~61.0% of them were from extremely poor or poor households, causing deep concern for increasing household energy burden. Accordingly, the share of surveyed households in energy poverty increased from 30.1% to 34.2%. Despite the declining inequality in energy cost, a growing inequality in energy burden was revealed during 2013–2017. Our results demonstrate that the energy burden on rural households increased due to the dramatic rise in the cost of clean energy, while urban households tend to spend a lower and decreased proportion of their income on energy. Between 2013 and 2017, China enacted a series of policies to improve air quality by promoting a switch to cleaner fuels for households. This study examines the changes in energy cost and associated energy burden across regions and income groups during this period, finding an increased burden on rural households.

Since 2013, China has initiated a rapid energy transition that replaces traditional solid fuels with modern clean energy. Despite the tremendous success of the energy transition, its impacts on household energy costs and associated energy inequality remain largely unexplored. Here we use data from a large nationwide household survey to investigate these trends. We find that about two-fifths (43.0%) of surveyed households switched from traditional solid fuels to clean energy during 2013-2017. However, 56.1% to ~61.0% of them were from extremely poor or poor households, causing deep concern for increasing household energy burden. Accordingly, the share of surveyed households in energy poverty increased from 30.1% to 34.2%. Despite the declining inequality in energy cost, a growing inequality in energy burden was revealed during 2013-2017. Our results demonstrate that the energy burden on rural households increased due to the dramatic rise in the cost of clean energy, while urban households tend to spend a lower and decreased proportion of their income on energy.
To address severe air pollution issues and improve public health, the State Council of China promulgated its toughest-ever Air Pollution Prevention and Control Action Plan in September 2013 1 . Since then, many aggressive emissions-control measures have been initiated to reduce fossil fuel consumption and associated air pollutant emissions. As a major contributor to air pollution, residential emissions accounted for 33% of primary PM 2.5 (particulate matter with an aerodynamic diameter less than 2.5 μm), 50-60% of black carbon and 80-90% of organic carbon 2,3 during 1990-2013, which was responsible for 32% of the air pollution-related premature deaths in China 4 . In response, since 2014 5 , a series of residential energy-transition actions have been implemented by China's central and local governments, which have facilitated a rapid uptake of clean fuels and resulted in remarkable improvements in air quality 5,6 . During 2013-2017, PM 2.5 concentration across the country decreased by 11.1% (ref. 5 ).
However, the magnitude of the residential energy transition across different regions and income groups during this period has not been well examined. In addition, policymakers and the public are greatly concerned about increased costs and the associated energy burden [7][8][9] (defined as the percentage of household income spent on energy bills) of the transition, considering that free or low-price solid fuels dominate the rural and poor household energy mix 10,11 . More details about the definitions of these terms can be found in Supplementary Note 1.
Article https://doi.org/10.1038/s41560-023-01193-z traditional and clean fuels in this rapid energy transition. This finding is consistent with Tao et al. 10 and Carter et al. 12 , who pointed out that the relatively higher cost of clean fuels affects the sustainability of energy-transition programmes in China.
The transition to clean energy was more evident in rural households than in their urban counterparts. During 2011-2013, there were 524 rural households in the sample that underwent the transition to clean fuels, with a share of 61.6% (Fig. 2a). Since 2013, in support of a series of residential energy-transition programmes and socioeconomic development 5,6 , this trend was accelerated, and the number of rural households that replaced solid fuels with clean energy reached up to 730 during 2013-2015 (Fig. 2b) and 802 during 2015-2017 (Fig. 2c), which accounted for 64.7% and 67.3% of all surveyed households that switched from traditional solid fuel to clean energy, respectively. By 2017, rural households using LPG/NG increased from 22.3% in 2011 to 31.5%; correspondingly those using wood and crop residues decreased from 49.4% to 39.1% ( Supplementary Fig. 2). Despite this rapid transition, free or low-price solid fuels still prevailed in rural China, and a large urban-rural disparity in energy mix uses still existed. It was particularly true considering the large urban-rural gap in both household income and access to clean energy infrastructure 10,11 .
Meanwhile, among the households involved in the energy transition, the shares of extremely poor and poor households (depicted in Supplementary Table 1), regardless of urban or rural households, were the largest and continued to increase during our study period. As we observed, households in poverty accounted for 56.0% of all sampled households that underwent energy transition during 2011-2013 (Fig. 2d). Given the enormous efforts on replacing solid fuels in the rural and suburban communities since 2013 1,12,13 , it can be found that the proportion of rural households in poverty of all sampled households that switched from solid fuels to clean fuels, increased from 21.3% (during 2011-2013) to 28.2% (during 2013-2015; Fig. 2e) and further to 34.4% (during 2015-2017; Fig. 2f). Overall, households living in poverty To address these issues, we employ data from the China Family Panel Studies (CFPS) to study the link between the energy transition and energy cost-based inequality by tracing the targeted households to different income groups. Moreover, we investigate the differential impacts of the energy transition on household energy cost and energy burden between urban and rural areas and also among different regions. The geographical distribution of our samples can be found in Supplementary Fig. 1. Our analysis also reveals determinants of energy burden during the rapid energy transition. The findings could inform the public of the large-scale energy transition in China and associated changes in energy cost and energy burden and provide in-depth insights into the multidimensional inequality in energy usage for the governments of China and other developing countries.

Rapid household energy transition
To quantify the residential energy transition in China since 2013, we kept track of the households that appear consistently in at least two adjacent waves (for example, 2013 and 2015 or 2015 and 2017). Meanwhile, to better understand the difference in magnitude of energy transition between before 2013 and after 2013, we also traced the changes in the energy patterns of sample households during 2011-2013. As shown in Fig. 1, during 2011-2013, 851 households switched from solid fuels to clean energy, such as LPG/NG (liquefied petroleum gas/natural gas), solar, biogas and electricity, which accounted for 22.1% of the surveyed participants using solid fuels in 2011. Since 2013, this transition process was accelerated, 1,129 and 1,192 households completed the transition to clean energy during 2013-2015 and 2015-2017, respectively. As a result, the share of households using LPG/NG increased from 38.4% in 2011 to 48.2% in 2017 ( Supplementary Fig. 2), whereas the share of households using wood and crop residues dropped substantially from 34.6% to 24.7% during the same period. Meanwhile, note that a small share of households switched back to solid fuels (about 5% of the surveyed households), indicating the existence of the combined use of  Article https://doi.org/10.1038/s41560-023-01193-z contributed to 56.1% and 61.0% of the total households that replaced solid fuels with clean fuels during 2013-2015 and 2015-2017, respectively. This finding inevitably caused the concern for the affordability of clean fuels among these poor households and raised the inequality issue of energy cost and its associated energy burden.  Fig. 3), the inequality measured by energy cost yielded a declined trend, with the Gini indexes decreasing from 0.462 to 0.433 and further to 0.413 during this rapid energy-transition period. In contrast, when measuring inequality by household income, the Gini index increased from 0.418 to 0.516 during 2013-2017, indicating that household income was more unevenly distributed than energy cost nationwide. This finding is consistent with a previous study showing that China's latest Gini index was more than 0.50 under the influence of rapid economic growth 14 .

Inequality in energy cost during the rapid energy transition
The rising inequality in household income will inevitably lead to a higher inequality in energy burden. Figure 3d-f shows great differences in cost inequality by fuel types. Specifically, inequality in central heating cost was the starkest, with Gini indexes larger than 0.8, as more than 80% of households consume none. Followed by the cost of LPG/NG/coal use, half of which was consumed by 12.5% of households, its Gini index reached 0.640, 0.600 and 0.565, respectively. By comparison, the distribution of electricity costs was relatively even, with the Gini index declining from 0.459 in 2013 to 0.433 in 2017. These findings are consistent with Wu et al. 8 , who measured the energy inequality among rural households in China 11 . Meanwhile, our findings also revealed that the Gini index for the cost of each type of energy source displayed a decreasing trend, especially for LPG/NG/ coal costs. This is because more and more households had access to gas fuel due to the effective implementation of the energy-transition campaign. As a result, when decomposing the Gini index by fuel types (Supplementary Fig. 4a), the contribution of LPG/NG/coal cost to the overall energy inequality decreased from 47.1% in 2013 to 39.4% in 2017.
The urban-rural gap and regional disparity are two main factors that shape China's income inequality in the past four decades 11,14 . To gain insights into the impacts of these two structural differences on energy cost, we examined energy inequality in urban versus rural areas and nine subregions. As shown in Fig. 4a and Supplementary Table 2, urban households generally spent more of their annual income on energy costs than their rural counterparts. In 2013, the median annual energy bill for urban households in China was 2,400 RMB (1 RMB ≈ US$0.16 in 2013), 71.4% higher than that for rural households. By 2017, the cost for urban households rose to 2,700 RMB (1 RMB ≈ US$0.15 in 2017), 50% higher than that for rural households. Meanwhile, because of the substantial transition from free or cheaper solid fuels to clean fuels and improvements in living conditions, the energy cost for rural households also dramatically increased. During 2013-2017, the annual growth rate for energy costs in rural areas averaged 6.5 percent, twice the growth rate in urban areas (3.0%). Among those, poor rural households experienced the highest growth in energy cost, with an annual rate of 10.7%, followed by rural households with middle income.
In addition, from 2013 to 2017, the Gini index decreased from 0.399 to 0.379 for urban households and from 0.512 to 0.436 for rural households (Fig. 4b-d). This finding indicates that the increased access to commercial energy services in rural areas reduced the intra-group (that is, within rural areas) inequality and its contribution to the overall inequality ( Supplementary Fig. 4b, c). To further evaluate the impact of rapid energy transition on these declines in energy cost-based inequality, we conducted a linear regression model with fixed effects and clustered standard errors at the province level (Supplementary Note 4). The results show that with the rapid energy transition, inequality in energy cost was reduced to some extent at the provincial level (Supplementary Table 3) and growth in household income. Note that compared with their urban counterparts, the energy inequality among rural households was still evident. Meanwhile, the Lorenz curves for both urban and rural household income were skewed towards the 'perfect inequality' line, accordingly raising a deep concern for the inequality in energy burden during this period of rapid energy transition.
Besides the urban-rural disparity, regional difference in climatic and socioeconomic conditions are expected to affect energy consumption and cost. As shown in Supplementary Fig. 1   Article https://doi.org/10.1038/s41560-023-01193-z Table 4, we classified the surveyed participants into nine regions based on climatic and economic zones. Supplementary Fig. 5a reveals the great disparities in inequality in energy costs across regions, and Supplementary Fig. 5b shows a significantly negative correlation of the Gini index with economic development level for nine regions in the three years. In this regard, households in well-developed regions generally have convenient access to modern energy, leading to a low disparity in energy costs. In addition, Supplementary Table 5 presents components of the overall Gini indexes by region during 2013-2017. The results show that the overall inequality across regions came from the between effect, indicating that the disparity among different regions was the major source.

Inequality in energy burden
As shown in Fig. 5, the average household spent about 5% (the median) of their income on energy during 2013-2017, notably higher than that of US households (about 3.3%) (refs. 15,16 ). Meanwhile, we found that stark disparities in household energy burden existed between urbanrural areas. For urban households, the median energy burden declined from 5.4% in 2013 to 4.8% in 2017, while the opposite was observed (increased from 5.3% to 6.5%) for rural households. In this case, with the rapid energy transition in recent years, urban-rural inequality in energy burden increased. When measuring energy inequalities using energy poverty [15][16][17][18][19] , defined as households spending more than 10% of household income on energy expenditures, we found that 30.1% to ~34.2% of surveyed households experienced energy poverty in China during 2013-2017.
Rural communities were disproportionately and increasingly affected by energy poverty, with a share of 32.2% to ~39.2% during the research period. By comparison, the share of urban households in energy poverty decreased from 27.9% to 25.2%. Moreover, as the abrupt energy transition proceeded, the inequality of energy burden between income groups was somewhat enhanced. As shown in Fig. 5, the median energy burden for extremely poor rural households varied between 20.3% to ~48.2% in 2013 in nine regions, with more than 75% of households in energy poverty. These extremely poor rural households were followed by extremely poor urban households with a median energy burden of 8.1% to ~15.3%, and 46.3% to ~68.2% of them were in energy poverty. While for rural and urban rich households, the median energy burden ranged between the intervals of 0.8% to ~3.1% and 1.1% to ~3.3%, respectively, the urban-rural disparity was not clear. In 2017, extremely poor rural households still suffered from the highest energy burden, and their median energy burden climbed to 29.1% to ~55.2% in the nine regions, remarkably heavier than that in 2013. Moreover, 80% of them were in energy poverty. Meanwhile, the energy burden on extremely poor urban households also grew slightly (10.8% to ~16.8%), but the share of households in energy poverty had risen sharply, which increased to 53.1% to ~71.4%. By contrast, the distribution of median energy burden on both rich urban and rural households kept stable.
The regional inequality in energy burden was also obvious. As shown in Supplementary Table 6, with the high space-heating demand but relatively weak economy, households in those regions with cold winters (including SECO, COLD1, COLD3 and HSCW1) generally had the highest median energy burden, paying roughly more than 6.8% of their

Determinants of the inequality of household energy burden
Inequality in energy burden across urban-rural areas and different regions prompted us to consider how household energy burden was associated with variables related to household-level factors and distal factors. More details can be found in Table 1. We constructed linear and nonlinear models to understand which variables, if any, may correlate to household energy burden. More information about the theoretical background behind the selection of driving factors and the build of the model is presented in Methods and Supplementary Note 5.
Results from the linear models were presented in Table 2. Columns labelled 'Model (1)' and 'Model (2)' show the results obtained with the ordinary least squares (OLS) regression models with household fixed effects and province-year fixed effects, respectively. As expected, household economic level directly affects the household's energy burden, and households with older adult members may bear a heavier burden of energy cost due to a higher possibility of more energy use but lower household earnings. However, it can be found that households with more family members are more likely to carry a light energy burden because of more employed persons in the household, and accordingly more income. Meanwhile, compared with urban households, their rural counterparts seem to be more likely to bear heavier energy burdens during this large-scale and rapid energy transition. Regarding indirect impacts of energy infrastructure on household energy burden, we found that deeper penetration of gas infrastructure is positively related to local households' energy burden because of its higher price than that of traditional solid fuel. As for the impacts of climate conditions on household energy burden, OLS regression results cannot give statistically strong evidence.
To eliminate the endogeneity between the core explanatory variable (lnIncome) and explained variable (lnBurden), two-stage least squares (2SLS) regression analysis was conducted. The columns labelled 'Model (3)' in Table 2 reported the regression results, and its robustness test was presented in Supplementary Note 5.

Urban
The index refers to rural household (0) or urban household (1).
Urban morphology generally provides residents with better access to efficient appliances, stable jobs and modern energy infrastructure.

Access_gas
The proportion of population with access to gas fuels in a region.
An exogenous variable that describes household accessibility to modern energy-supply services, which is unlikely affected by the household energy burden.

Climate
The sum of heating degree days and cooling degree days for a province.
Energy demand for both heating and cooling affects household energy costs. In this study, we calculated the sum of heating degree days and cooling degree days for each province to estimate the effect of climate conditions on household energy burden.
Article https://doi.org/10.1038/s41560-023-01193-z Both under-identification and weak-identification tests suggested that the instrumental variable (lnShare_off_farm) was valid. Although the coefficient of lnIncome remained negative, its absolute value dramatically decreased. In this regard, the endogeneity problem seems to be addressed in the 2SLS model, and the final results were generally consistent with those with OLS models. Compared with the linear models, we found that the random forest regression model showed better results with a goodness-of-fit of over 88% (as shown in Fig. 6). Specifically, lnIncome had an overall negative correlation with household energy burden and dominated over the other factors (as shown in Supplementary Fig. 6), which was consistent with the results from linear models. However, except for the dummy variable (Urban), the other four variables (including lnAge, lnFamily_size, lnAccess_gas and lnClimate) presented a mixed (both positive and negative) relationship with lnBurden, providing us further information about the relationships between these factors and household energy burden. For instance, the results demonstrated a U-shaped relationship between household energy burden and the average age of family members. Namely, in the beginning, the increasing average age of family members is negatively associated with household energy burden, which is inconsistent with the results of the linear models. Moreover, households with an average age of 25 to ~45 generally bear a lower energy burden. Then, as household members get older, the energy burden seems to become heavier, which generally agrees with previous findings 15,17 and our linear models. From the life-cycle perspective of the change in household income, the result from the nonlinear model seemed to be more reasonable, considering that household income and earnings tend to present an inverted U-shaped curve throughout the life cycle 20,21 , while residential energy consumption increases over the life course 22,23 .
As for the correlation between family size and energy burden, the linear model produced a negative regression coefficient, while the random forest regression model indicated that when the number of family members is more than four, the household energy burden seems to be stable. Because rural households generally have a larger family size and are more likely to use free or low-price energy, this pattern seems to be more reliable.
In addition, the household energy burden also differed across regions with different gas-infrastructure penetration. Regions with low gas-infrastructure penetration are more likely to be under developed. Therefore, an increase in the availability of modern clean energy infrastructure will lead to a rise in energy costs and the share of income spent on energy costs. However, when the gas-infrastructure penetration rate in a region exceeds 70%, the household energy burden shows a negative correlation with the gas-infrastructure penetration. One of the probable reasons is that regions with high gas-infrastructure penetration are more likely to be well developed, where households have a high income. Compared with the other variables mentioned above, the variable lnClimate had a mixed and loose association with the variable lnBurden.

Discussion and conclusions
In the past decade, although a series of actions have been taken to facilitate the residential energy transition to address severe air pollution issues, abruptly and substantially switching energy sources across the country raised deep concerns about energy inequality 5,6,13 . To address this issue, we employed nationwide survey data to identify the households that underwent a variety of energy-transition pathways during this rapid transition. Our findings show that during 2013-2017, cost-based energy inequality among Chinese households declined but still existed. Notably, households that experienced energy transition were mainly dominated by low-income groups (that is, extremely poor and poor households), with a share of around 60%. This finding also helps explain why households that adopt clean stoves often concurrently use their solid-fuel stoves in a previous study 13 . Further, the urban-rural disparity and regional gap in China are the root cause of inequalities across the economic 11,14 , environmental 24 and health dimensions 25 . However, the question of how the abrupt and massive-scale energy transition affected existing inequalities remains. Our findings indicate that the energy cost-based inequality had declined within either urban or rural areas. However, the inequality level within the rural areas was still high. Moreover, unlike urban households, which experienced a decline in energy burden, the energy burden on rural households was reinforced. Therefore, both urban-rural and regional inequalities in energy burden were aggravated to some extent.
Our study has several limitations that need to be addressed in future studies. First, we regard households using clean energy for cooking as those completing energy transition without considering the combined use or stacked use 10,13 of traditional solid fuels and clean fuels, which may suffer from a partial transition bias. In reality, households with the combined use of free or low-price traditional fuels and clean fuels generally pay relatively lower energy costs than those with the complete transition. Given that the main purpose of this study is to examine energy cost-based inequality, excluding them from the whole samples may cause the overestimation of energy cost and its related financial burden. Therefore, we also treated them as the research target because they also represent a special group in recent China. Second, in the CFPS microdata, income and energy cost were self reported and retrospectively collected and thus inevitably suffered from measurement errors due to privacy concerns. Considering that surveyed households were asked to remember energy cost in the last year 26,27 , it can be regarded as recent enough, and changing energy sources is unlikely. For the same reason, some detailed household address data were missing at the city level. As a result, the data analysis was conducted at the provincial level. Third, the causalities from specific policies and other socioeconomic factors to energy inequality issues were not verified in this study. Energy transformations are now mainstream options for tackling the dual task of mitigating climate change and improving air quality 28,29 . However, progress on the energy transition remains slow in developing countries 30,31 . In this case, deliberate policy design is urgently required in the developing world. In China, to accelerate the implementation of the 'Coal-to-Gas' and 'Coal-to-Electricity' project, most governments in northern China have been providing subsidies to households 11,32 . But such an approach was unsustainable and placed a heavy burden on local government finance. For example, according to the 2018 Hebei provincial Clean Winter Heating Plan 33 , the government cut the annual maximum subsidy on the gas cost for a household joining the 'Coal-to-Gas' project to 960 RMB from 1,200 RMB in 2017 34 . Meanwhile, this subsidy to participating households generally ran for three years; after that, phasing out coal use will be prohibitively expensive for most of the targeted rural households. Here we put forward policy suggestions from four perspectives as follows.
First, considering the differences in economic development levels and energy endowments, governments should introduce and implement different measures to clean residential energy consumption. For instance, for well-developed and warm regions, including the Southeast and Yangtze River Delta, providing easy access to modern clean energy by constructing gas pipeline infrastructure and subsidizing the cost to connect to a gas pipeline or purchase a gas boiler is the key step. Meanwhile, in the underdeveloped areas of North China with cold winters, households may choose not to switch from coal or crop residuals to gas due to higher costs of gas and their low income 13 . In this case, additional actions to reduce harmful emissions from household energy combustion are necessary. The rural central heating system is one of the main business and practical solutions for the future. In addition, promoting the adoption of improved pellet stoves, which burn compressed wood or biomass pellets to create a source of heat 35 , would be more practical in term of slightly lower price but dramatically cleaner emissions when compared with the traditional coal-burning stove.
Second, lessons on low-income home energy assistance programmes from the United States and European countries can shed some light on helping low-income households use clean energy in tandem with efforts to alleviate poverty. At present, Chinese governments mainly subsidize low-income households' electricity bills, which do not cover other fuels such as liquid petrol gas and natural gas. In this case, governments should expand the scope of energy

Article
https://doi.org/10.1038/s41560-023-01193-z assistance from single electricity sectors to total modern energy sectors for low-income households. Third, previous studies demonstrated that households that adopt clean stoves often concurrently use their solid-fuel stoves for years in Asian developing countries [36][37][38] , driven by fuel prices and the perceived suitability of stoves for different tasks. In particular, Malakar and Day pointed out the importance of fully understanding the differences in the perspectives on different fuels to unpack the dynamics of household energy transition 38 . In this regard, it is critical to help households realize the advantages of clean energy through educational programmes, 'peer effects' 39 , relative rewards and incentives and other means.
Fourth, further promoting population urbanization is more likely to lessen inequality in both energy cost and its burden on households due to the relatively high efficiency of clean energy systems in urban areas. Meanwhile, distributed renewable energy programmes should be carefully designed and implemented to effectively promote the energy transition and household income growth in concert. A targeted solar photovoltaic power adoption programme for poverty alleviation can provide another alternative to the government 40 . By the end of 2019, this programme has assisted 4.18 million poor households in total 41 , and each of them can benefit from 3,000 RMB annually from power-generation revenue.
It has been demonstrated that many factors, in particular policy efforts, could affect the energy transition and the associated energy burden and energy inequality 5,12,36 . In future work, a systematic assessment of existing measures on the household energy transition can assist in determining more effective policy packages in China and other developing countries. Meanwhile, both intra-household and racial inequalities in energy use and associated cost also merit further investigation. In addition, besides convenient access to clean energy, a complete household energy transition also relies on affordability and thus residents' socioeconomic status. In this regard, further analysis is needed to better understand how to help poor and rural communities make a smooth transition to clean and affordable energy.

Survey design, sampling and implementation
The household-level data we use are retrieved from the CFPS (China Family Panel Studies), which allow us to assess energy inequality and energy burden during 2013-2017. The CFPS focus on the socioeconomic, demographic and health aspects of the Chinese population in the context of rapid economic development. This survey is a nationally representative annual longitudinal survey of Chinese communities, families and individuals launched in 2010 by the Institute of Social Science Survey of Peking University, China. Follow-up surveys of the CFPS are designed to take place on a biennial basis. In this study, we obtained the data from four follow-up surveys in 2012, 2014, 2016 and 2018.
As a longitudinal survey, CFPS collects information on a nationally representative sample of families and all family members in those sampled families. However, conducting surveys in remote, minority regions, especially where travelling is very difficult, is a huge challenge and infeasible. In this case, considering the regional differences in Chinese society and the reduction of survey processing costs, the surveyed areas include 25 provinces or administrative equivalents that represent about 95% of the total population in mainland China (excluding Hong Kong, Macao and Taiwan) 26 . The original target sample size was 16,000 households. Half of the sample was generated by oversampling with five independent sampling frames of Shanghai, Liaoning, Henan, Gansu and Guangdong, and each of the subsamples had 1,600 households. These five provinces were representative of the regional level, which could contribute to provincial population inferences and cross-region comparisons. The other 8,000 households were from an independent sampling frame composed of 20 provinces. The names of provinces, target sample size (in terms of households) and the response rates for different years are provided in Supplementary Table 8. Note that sampled units within each of the 20 provinces in this subsample are not representative at the province level. Through appropriate weighting, the whole CFPS sample can achieve representativeness of the 25 provinces in China, thereby representing China as a whole.
The sample was drawn through a multi-stage probability proportional to size sampling procedure with implicit stratification to reduce the operational costs of the survey while remaining representative of the Chinese population. Each CFPS subsample was drawn through three stages: administrative districts (in urban areas) or counties (in rural areas) as the primary sampling units, neighbourhood communities (in urban areas) or administrative villages as the second-stage sampling units and households as the third-stage sampling units. CFPS are conducted through a face-to-face or telephone interview with a structured questionnaire survey. The questionnaire includes five modules: family roster, focusing on basic sociodemographic information of all family members; family economic conditions, focusing on family income, expenditure and assets; individual self-report for all respondents aged 10 and above; and child proxy questionnaire for respondents aged 0-15. In this study, we collect detailed information on energy choice and energy billing (such as the monthly cost of electricity and the other fuels used and the yearly heating bill). This enables us to derive the annual energy cost for a family.

Data quality control
As a national comprehensive survey programme, the CFPS consists of large sample sizes, a wide survey coverage and a complicated design. Nevertheless, strict quality-control measures are essential for the survey. To ensure the quality of the data, the survey applied different methods of monitoring and intervention at different sampling stages.
First, in the 2010 CFPS baseline survey, the data quality was primarily affected by precision in the ultimate sampling frame. At this stage, the factors that might affect the data quality include but are not limited to improper designs of the questionnaires, inaccurate terminal-stage sampling, irregular behaviours of the interviewers, mistakes in data collection and compilation processes. To control the quality, the supervision strategy was adopted in the ultimate sampling frame, including telephone checks, field checks, audio record checks, interview reviews, statistical analyses and so on.
Second, in the follow-up surveys, several quality-control strategies such as statistical data checks, audio record checks and phone call checks were applied. Specifically, statistical data checks have covered 100% of the complete interviews, and all the variables in each questionnaire and data should be checked every day. Meanwhile, all the interviewers should go through all the checking methods and check the data from each questionnaire. In addition, the problematic interviewing behaviours were delineated, and the objective criteria for judgements of misconduct were also set. In addition, the households whose questionnaires for families or individuals were completed are to be checked via audio record, phone calls, onsite or interview reviews by the percentages of 15%, 25%, 15% and 5%, respectively.
Third, to clean up the invalid parts of the sample, the data-cleaning process was also applied at different stages for each dataset. More detailed information about data cleaning for variables can be found in the China Family Panel Studies User's Manual (3rd edition) 27

Representativeness of the sample
Supplementary Table 9 provides a comparison of household characteristics for the CFPS survey with the official statistics. Overall, the demographic variables, including average household size, male percentage and age composition, are consistent with the official numbers. For instance, the average household size was 3.27 members in CFPS 2012, with 50.5% male and 10.9% old population aged 65 years or over. These indicators were closely matched with the National Bureau of Statistics value (NBS 2012). Moreover, the household survey data on economic conditions and their composition also showed great similarities with the official statistics. For example, the average per capita income of an urban and rural household in CFPS 2012 was 23,365 Yuan and 9,569 Yuan, which was not significantly different from the official number (23,979 Yuan and 9,833 Yuan, respectively). In this regard, the CFPS survey can be regarded as a sample representative of Chinese households.
In addition, to ensure the representativeness of these data, we compared the proportions of households reporting fuel use for cooking based on the tracking sample with those based on the total sample and found the results from the two samples are wholly consistent as shown in Supplementary Table 10, indicating the tracking samples can well represent the whole sample.

Classification of income groups and regions
To classify the surveyed households into different income groups, we used the grouped income data from the national average as proxies to split the data into 12 groups. The data are shown in Supplementary  Table 1.
To explore the regional differences in household energy cost and associated energy burden, the entire study area was divided into nine regions according to climatic conditions and economic development levels. The literature 7,8,25 usually classified the mainland into seven regions (Northwest, Northeast, North, Central, Yangtze River Delta, Southwest and Southeast) based on geographical proximity and development stage (region definitions are included in Supplementary  Table 4 and Supplementary Fig. 1a). When referencing climatic conditions, it is common to classify the country into five climatic zones (severe cold winter, cold winter, hot summer and cold winter, hot summer and warm winter and temperate weather) following the Standard of Climatic Regionalization for Architecture (GB 50178-1993). According to this standard, we divided seven regions into nine regions as shown in Supplementary Fig. 1b, including the northeast region with severely cold winter (SECO), the northwest region with cold winter (COLD1), the central region with cold winter (COLD2), the north region with cold winter (COLD3), the southwest region with hot summer and cold winter (HSCW1), the central region with hot summer and cold winter (HSCW2), the Yangtze River Delta with hot summer and cold winter (HSCW3), the southeast region with hot summer and warm winter (HSWW) and the southwest region with temperate weather all year (TEMP).

Measures of household energy cost and associated inequality
By collecting the cost of electricity and fuels per month and the annual central heating cost, we calculated the aggregated energy cost as: C all = 12 × (C elec + C fuel ) + C heat (1) where C all is the aggregated household energy cost and C elec , C fuel and C heat represent electricity cost per month, other fuel costs per month and the annual cost of central heating, respectively.
This bottom-up cost-based accounting approach has several merits. First, it allows us to aggregate household energy costs by fuel type. Second, it can accommodate the estimation and aggregation of unmeasurable energy, such as biomass. Third, to some extent, this indicator provides us with a new and proper perspective on addressing the energy inequality issue, considering the inequality in the energy mix that is often ignored in previous studies. However, it inevitably suffers from the common challenge of self-reporting bias in collecting microdata.
As one of the most widely used analytical tools for measuring inequality, the Lorenz curve and the Gini index were used in this study. The Lorenz curve was developed by Lorenz in 1905 to represent the inequality of wealth distribution in a population 42 . In this study, the energy-cost Lorenz curve was defined as a ranked distribution of the cumulative percentage of the households on the horizontal axis versus the cumulative percentage of energy cost distributed along the vertical axis. In normal cases, a point on the energy-cost Lorenz curve shows that y% of the overall energy cost is consumed by x% of the household samples.
The Gini index proposed by Gini 43 is a numerical representation of the inequality in income or wealth. It is typically defined mathematically based on the Lorenz curve as follows: where H i indicates the cumulative number of households from 1 to h with the basis of ranking list from lowest to highest cost on energy demand; C i denotes the corresponding cumulative cost of energy consumption by household groups from 1 to i.
To assess the robustness of the Gini index, this study also applied the Theil and Atkinson indices to quantify the energy cost-based inequality trends. As shown in Supplementary Fig. 7, when employing the three indexes to quantify the household energy cost and income inequalities, their patterns and trends highly agreed with each other, indicating the robustness of the Gini index in this study.

Gini indexes by energy sources and regions
To investigate the influence of fuel types on energy-cost inequality, the Shapley approach 44 was applied to decompose Gini indexes into a sum of contributions generated by electricity, fuel and central heating cost. The decomposed process was expressed as: where n is the total number of players, ϕ i (val) denotes the amount that player i gets given a coalitional game. val x (S) is the prediction for feature values in set S that are marginalized over features that are not included in set S, which is known as the coalition force or the worth of the coalitions.
In addition, the Gini index was decomposed into an urban component and a rural component and nine regions, following a strategy similar to that suggested by Yang 44 and Wu et al. 8 : where n i is the sample size in group i and n is the total sample size. w i is the ith group's proportion of energy cost or income, and G i is the Article https://doi.org/10.1038/s41560-023-01193-z ith group's Gini index. w k is the kth group's proportion of energy cost or income, and k ≠ i. G within represents the inequality within group i, and the term G between measures the disparity across groups. As a residual, G overlap expresses the magnitude of overlaps between the urban and rural groups or between the nine regions. When the household energy cost in different groups does not overlap, G overlap will yield a zero value. For example, if the highest energy cost in the rural group is lower than the lowest energy cost in the urban group, then G overlap equals zero. In short, this term's size and proportion in the overall Gini index depends on the degree of overlaps among subgroups. As for the practical implication of this term, there has not been a common and convincing explanation so far. In this regard, this study applied this technique to primarily estimate the changes in G within and G between .

Determinants of the inequality of household energy burden
Energy burden varies across the country, as do household demographics and regional conditions relevant to energy consumption patterns. Here we wanted to understand the relationship between the difference in energy burden and its driving factors among households. Previous work provided a theoretical framework for understanding the concept of energy burden and its measurement [15][16][17][18]38,45,46 , which is generally defined as the percent of household income spending on energy billing. Theoretically, drivers of high household energy burden can be included in five main categories-household socioeconomic situation, housing characteristics, energy prices and policies, location and geographical conditions and behavioural factors 15,45,46 . According to previous studies, considering the availability of data, this study evaluated a comprehensive set of household-level factors (household demographic characteristics and socioeconomic status) and several distal factors (gas access penetration, urban-rural settings and climate conditions) that have been previously associated with energy burden.
In reality, the relationship among variables is not necessarily linear, while the nonlinear seems to be more reasonable because of free form, fewer constraints, no requirements for data distribution, strong adaptability and so on. However, the nonlinear relationship among variables seems to be hard to determine and analyse. On the basis of the abovementioned consideration, we constructed linear and nonlinear models to understand which factors significantly correlate to household energy burden.
First, we performed a multivariate linear regression analysis for the total sample, in which the ln form of household energy burden (lnBurden) was defined as the independent variable and household income level as the core explanatory variable (lnIncome). To eliminate bias from unobserved variables, including those that change over time but are constant over entities and differences across entities and provinces but are constant over time, the models were estimated using household fixed effect and province-year fixed effect unbalanced panel regression as follows: ln Burden it = α 0 + α 1 ln Income it + βX it + ε it (7) where ln Burden it represents the energy burden of i household at year t, α 0 and α 1 represent the intercept term and regression coefficient of the core explanatory variable, household per capita income in ln form (ln Income it ), respectively. The notation X it denotes a set of control variables in the ln form listed in Table 1 in the main text. The model here also included the error term, ε it . Considering the existence of heteroskedasticity in the province-year fixed effect regression model, clustered standard errors at the province level were also used to analyse panel data in the baseline specification. Empirical results might experience endogeneity problems because of missing variables or reverse causality. To address this concern, we introduced an instrumental variable lnShare_off_farm into a two-stage least squares (2SLS) model. Detailed methodology introduction and application notes can be found in Supplementary Note 5.
The first-stage estimation in the 2SLS regression model can be defined as follows.
The second-stage estimation can be shown as follows.
ln Burden it = α 0 +lnIncome it + βX it + ε ′ it ′ (9) where l nIncome it is the predicted value of ln Burden it in the first stage. Finally, the province-year fixed effect model with clustered standard errors at the provincial level was estimated to test the robustness of the abovementioned model.
In addition, the literature shows that linear models generally fail to capture nonlinear relationships that possibly existed among variables, while nonlinear models can do this because they have free form, fewer constraints, no requirements for data distribution and strong adaptability 47 . Therefore, nonlinear model regression has become increasingly popular in energy economics 48 . Among nonlinear regression techniques, random forest regression has been shown to be an excellent tool in previous studies, given its robust regression power and easily interpretable learning mechanism 49 . Therefore, we used the randomForest package in the RStudio 1.2.133544 to determine whether nonlinear relationships exist in the data. Then, we used the root mean square error to evaluate linear and nonlinear models' performance.
Moving beyond the machine-learning model, we employed partial dependence plots (PDPs) to examine the nonlinear relationships between all variables and household energy burden. PDP is a graphical description of the marginal response of the dependent variable due to each variable by fixing all other explanatory variables at their mean values. Locally estimated scatterplot smoothing curves were applied in this paper. These curves were drawn on the RStudio 1.2.133544. Confidence intervals were constructed using a bootstrap method with 1,000 replications, and 95% confidence intervals for each locally estimated scatterplot smoothing curve were presented in the figures.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The household-level data supporting the findings of this study are openly available from the Institute of Social Science Survey at Peking University at www.isss.pku.edu.cn/cfps. The dataset contains household identifiers, location, energy costs by fuel types, basic sociodemographic information of all family members and family economic conditions, all of which are derived and generated by the authors. The household identification variable allows us to track the households in follow-up surveys. Sociodemographic data at the province level are collected from the China Census Bureau via API at http://www.stats. gov.cn/tjsj/. The daily average temperature data were obtained from 665 meteorological observation stations in 2013, 648 meteorological observation stations in 2015 and 563 meteorological observation stations in 2017, which can be downloaded from http://data.sheshiyuanyi. com/WeatherData/. Requests for all primary data will be reviewed and made available upon reasonable request. Source data are provided with this paper.

Code availability
Requests for the code developed and annotated in Stata (Version 15) and R (Version 4.0.2) to process and analyse the primary data will be reviewed and made available upon reasonable request. Article https://doi.org/10.1038/s41560-023-01193-z