Assessment of county-level poverty alleviation progress by deep learning and satellite observations

: Poverty alleviation is one of the greatest challenges faced by low-income and middle-income countries. China, which had the largest rural poverty-stricken population, has made tremendous efforts in alleviating poverty especially since the implementation of the targeted poverty alleviation (TPA) policy in 2014. Yet it remains unknown about the successfulness of the policy, because the official statistics are not timely available and in some cases questionable. This study combines deep learning with multiple satellite datasets to estimate county-level economic development from 2008 to 2019 and assess the effect of the TPA policy for 592 national poverty-stricken counties (NPCs) at country, provincial and county levels. Per capita gross domestic product (GDP) is used to measure the affluence level. From 2014 through 2019, the 592 NPCs experience an average growth rate of per capita GDP at 7.6%±0.4%, higher than the average growth rate of 310 adjacent non-NPC counties (7.3%±0.4%) and of the whole country (6.3%). This indicates an overall success of TPA policy so far. We also reveal 42 counties with weak growth recently and that the average affluence level of the NPCs in 2019 is still much lower than the national or provincial averages. The inexpensive, timely and accurate method proposed here can be applied to other low-income and middle-income countries for affluence assessment. United

A major barrier to assess the successfulness of the TPA policy is lack of timely, reliable socioeconomic data which is vital in ending poverty. Intensive socioeconomic household surveys and census are costly and time consuming, making timely updates of poverty virtually impossible 12 . In fact, the survey and census data are not available for most counties in 2018 and all counties in 2019. Furthermore, the survey data in several regions of China remain unreliable 13,14 . This absence of data on economy is a serious constraint to both research and policy, making it difficult to measure poverty gaps, understand why these gaps exist, and evaluate programs aimed at improving overall development.
Satellite image data is instant for us to capture, and contains an abundance of information about landscape features that could be correlated with economic activity, thus it can infer both spatial and temporal differences in local-level economic well-being 17 . In this study, we propose an independent method to estimate the affluence level and the successfulness of the TPA policy at a county level for the 592 NPCs by combining deep learning and multiple remotes sensing datasets. The remotes sensing datasets provide different perspectives of the scenes around the counties, and the presented model has the ability to recognize the discriminative economy related features from the datasets. As detailed in Methods, datasets used include remote sensing images (nighttime light data, Google Earth images, Sentinel-1, Sentinel-2 and Landsat 8 images), leaf area index (LAI) data, and county boundary data. Our deep learning framework is able to integrate disperse data with missing values from multiple sources to predict year-and county-specific population and gross domestic product (GDP). Deep learning combining with the disparate data can effectively reduce the prediction bias caused by label noise 15 and improve prediction performance.
We addressed four main questions. First, how is the reliability of our method for predicting economic growth? Second, how has economic development in the NPCs before/after the implementation of the TPA, especially at the county level? Third, how has economic development varied across the NPCs in different provinces over time?
Fourth, how have differences in economic development between the NPCs and their neighboring non-NPCs evolved over time?
To answer the above four issues, per capita GDP is utilized to measure the affluence level of each NPC as a metric that understands how the economy grows with its population. We compute the agreement between satellite-based and survey-based county-level economic development estimates to validate the precision and reliability of the presented approach. Then, based on the predicted GDP and population of each county from 2008 to 2019, we calculate its per capita GDP and annual growth rate (AGR-pcGDP) for 2009-2019; here, the AGR-pcGDP for a given year represents the growth from its previous year to that year. Next, we assess the effects of the TPA policy on the 592 NPCs through inferring both spatial and temporal differences in local economic development. Specifically, we compute the AGR-pcGDPs of the 592 NPCs before (2009-2013) and after (2014-2019) the implementation of TPA to find whether the NPCs had a higher economic growth after the TPA policy. To measure spatial variation in local economic outcomes from different perspectives, we compare the AGR-pcGDPs of the NPCs with those of the whole country, provinces and their neighboring 310 non-NPCs.

Results
Validation the performance of the deep learning model. The deep learning approach is illustrated in Supplementary Figs. 2 and 3. It relies on spatial features contained in remote sensing images to estimate population and GDP of each county.
To evaluate whether the learned feature representation by our framework can distinguish different object categories from remote sensing images, we map the high-dimensional learned feature to a 2-D space. The final feature layer is obtained by averaging each feature layer in the deep learning model. Supplementary Fig. 4 shows an example of the extracted features of different categories in the NPCs. It is observed that the deep learning model is able to recognize semantically meaningful features from the remote sensing images with cluttered background.
High-quality training and validation data are significant for enhancing the performance of the deep learning model. Considering the quality of the census data was different with years, we use Benford's law 16  We train the indicators of GDP, population and per capita GDP separately. In order to better evaluate the differences between NPCs before and after TPA, we not only evaluate the performance of NPCs at the county level, but also evaluate the difference comparison of NPCs at the provincial level and across the country (therefore, we need not only evaluate the pcGDP, but also get the corresponding pcGDP through GDP and population).   Table 1). The per capita GDP of each selected non-NPC is close to its respective adjacent NPC. Specifically, the difference in per capita GDP in 2013 between any non-NPC and its paired NPC is below 1,400 Yuan in each of the first 14 quantile ranges and is below 2700 Yuan in the 16th (i.e., richest) quantile range.
When results are averaged in any but the first of the 16 quantile ranges, we find high correlations across the years between the predicted and surveyed per capita GDP data for the NPCs (R 2 ≥ 0.92) and for their respective non-NPCs (R 2 ≥ 0.94) (Supplementary Table 1). This provides a basis for our quantile-based analysis. The correlation is lower (R 2 = 0.72) for the NPCs in the first quantile range (i.e., the poorest), whose AGR-pcGDP is thus not analyzed in the following.   Compared to more affluent regions, these four provinces have relatively inconvenient transportation, weak infrastructure and limited socioeconomic development, and have the lowest industrialization level and the highest poverty rate in China 7 . We thus focus on the NPCs in these four less developed provinces to further assess the economic development of the NPCs.
For each province, we compute the correlations between the predicted provincial average (from all NPCs) annual per capita GDP and the respective survey data.
Supplementary Table 4 shows high correlations (R 2 ≥ 0.96) between the prediction and survey data of the NPCs in the four provinces. For comparison, we also select the non-NPCs in each province from the aforementioned non-NPC list, including 8 in Gansu, 6 in Guizhou, 18 in Shaanxi, and 21 in Yunnan, with a total of 45.
Supplementary Table 4 shows high correlations (R 2 ≥ 0.92) between the prediction and survey data for the 45 non-NPCs. Fig. 2 shows that before the TPA, the average AGR-pcGDP of the non-NPCs in each of the four provinces is higher than that of their NPCs by 0.1%-4.7%. After the TPA, the average AGR-pcGDPs of the NPCs in Gansu, Guizhou and Yunnan provinces are higher than those of the corresponding non-NPCs by 6.4%, 0.4% and 0.6%, respectively. In Shaanxi, the difference in AGR-pcGDPs between the NPCs and non-NPCs is reduced from 4.7% before the TPA to 1.1% after the TPA. The growth rates of the NPCs are much larger than the provincial average obtained from the Statistical Yearbook in Gansu and comparable to the provincial average in Guizhou, Shaanxi and Yunnan. Further targeted support is needed in the latter three provinces to fasten the growth of their NPCs.  the mean AGR-pcGDP of all NPCs (7.6%±0.4%) is higher than that for the 310 non-NPCs (7.3%±0.4%), and for the whole country (6.3%) (Supplementary Fig. 6).
These results indicate that the TPA has an overall targeted, positive effect on the growth of NPCs. Room for policy improvement. Results from our timely and inexpensive method reveal an overall success of the policy so far as well as areas that need improved targeted support. However, not all NPCs have experienced sustained high growth rates, based on our estimate. There were 42 NPCs ( Supplementary Fig. 1) whose average AGR-pcGDP has declined over the years to values less than 5.0% (Fig. 4), much lower than the mean AGR-pcGDP of all NPCs. Most of these NPCs are in areas with inconvenient transportation and poor natural conditions, or rely heavily on resource-consuming enterprises that are affected by continuously strengthened environmental protection policies. More targeted support, financially and/or technologically, should be given to these NPCs.

Discussion
This is the first study to evaluate the long-term impact of China's TPA policy on poverty mitigation at the county level. Our results based on national, provincial and county-level analyses suggest that since 2014, the affluence of NPCs measured by per capita GDP has on average grown at a faster rate than those of adjacent non-NPCs. This suggests an overall success of the TPA policy so far. However, the growth rates of many NPCs have decline to values below 5% in recent years, and the predicted average affluence of the NPCs in 2019 is still much lower than the levels of the nation and respective provinces. Continuous, sufficient targeted support to the NPCs is still needed to enhance their economic performance and social welfare.
Note that people in poor regions of China can move to developed regions for work, but our model cannot fully capture the effect of migration on income. Nonetheless, the increased income of migrants is often sent back to their hometowns and spent on housing and other aspects that affect land use, land cover and/or nighttime light. Such an indirect effect is captured by the satellite data and our model framework.
Our results can be affected by the quality of county-level survey data. We investigate this issue based on available information. The local governments in Inner Mongolia have admitted their published survey data in 2014 to be un-authentically high 13,14 .
Through strict supervisions, their survey data quality has been improved since 2017.  Table 5).
This study demonstrates the capability of deep learning combined with publically available timely data sources in estimating socioeconomic development. The estimates are close to the official survey data. Our approach is not able to precisely predict GDP and population of each single county. However, combining results from multiple counties greatly reduces the influence of random errors and leads to satisfactory prediction of GDP, population and affluence, at the expense of spatial resolution reduction. Our inexpensive, convenient and reliable model framework for socioeconomic prediction complements the official data obtained through time consuming and resource expensive surveys. In particular, our framework can be applied to areas difficult to reach by census takers and to times with survey data unavailable.
In addition, our method using deep learning capturing economy related features from diverse datasets can accurately measure the local well-being over both space and time.

Remote sensing data
Satellite remote sensing data contributes substantially to our understanding of economic characteristics and social development. The potential of disparate satellite data to estimate socioeconomic factors has been proven 12 . Here, we use the following satellite data: nighttime light data, Google Earth images, Sentinel-1, Sentinel-2 and Landsat 8 images, and LAI data.

Deep learning framework
For the remote sensing image prediction with highly data-limited settings, transfer learning method is generally used to make the network learn better and more Due to the highly data-limited settings in NPCs, we choose the transfer learning method. First, we extract the features by taking the night light as the truth value.
Finally, we use statistical yearbook data and the features we get to do ridge regression to get the prediction indicators. We use a more advanced network and increase the types of daytime light images to add more daytime information to provide relevant economic factors, making the network forecast closer to the true value.
We present a deep learning model to estimate yearly GDP and population of each The classification term is used to classify the images into different light intensity categories. The regression term is designed to estimate the night light intensity value from the input satellite images.

Spatial and channel-wise attention.
Attention is the means of allocating available computing resources to the most useful components of a signal 22 . Spatial attention drives the deep learning model to focus more on the interested regions, which helps to generate representative features. For the economic assessment, we use the area on the night light image whose intensity value is greater than zero as our attention region.
We up-sample the feature maps by two deconvolution layers behind the improved where  Multi-loss. The ground-truth class labels of each remote sensing image are determined according to the total nighttime light intensities. The classifier is a 2-way fully-connected layer behind the spatial attention term with 1024 neurons and a softmax activation layer. In the classification term, the cross-entropy loss is employed, which is formulated as: where yij∈[0, 1] denotes the j-th dimension of the ground-truth class label vector for the training image i , ˆi j p is the output of the softmax layer, and n is the batch size.
For the regression term, the mean absolute error (MAE) loss is employed to measure the predicted nighttime light intensities. The MAE loss was formulated as: where ˆi b is the output of the regression layer, and n is the batch size.
The classification branch and the regression branch each generate a 1024-dimensional feature vector, and we concatenate the two vectors together as the input to the ridge regression.
Ridge regression. The deep neural network produces a 2,048-dimensional feature vector by a global average pooling layer from each satellite image. All images feature vectors of one county are averaged into a single vector. We separately use the normalized GDP and population along with the corresponding image vector to train the regularized ridge regression model. Ridge regression is a linear regression model, which imposes a square penalty on the magnitude of linear coefficients. When the correlation between our features is high, ridge regression is suitable. In our method, the feature has 2,048 dimensions, so regularization is used to eliminate over-fitting.
Training and testing algorithm. We train the attention-driven network model on an