3.1 Data
As the Rural Road Construction Plan (RRCP) was implemented during 2006–2020, a data set for this period is approperiate for our research. To analyze the impact of rural road projects financed by different sources on off-farm work, we must research the dynamic process of the residents' income, which means that we should utilize a panel data set. For this study, we chose the rural sample of the Chinese Household Income Project (CHIP), which are provided by the China Institute for Income Distribution. The CHIP survey has been conducted in 1989, 1996, 2003, 2008, 2014, and 2019 respectively, and these basic survey data formed a database of 1988, 1995, 2002, 2007/2008, 2013, and 2018.
We use individual-level and village-level data from the CHIP2007 and CHIP2008 rural household survey databases. This data set forms a representative sample of rural households for each year, using 10 families each from 800 rural communities across the nine provinces of Hebei, Jiangsu, Zhejiang, Anhui, Henan, Hubei, Guangdong, Chongqing, and Sichuan. We can compose a panel data set of these two years of sample data from CHIP2007 and CHIP2008.
In CHIP datasets, information on employment is only available for those over the age of 16. Therefore, we selected all individuals with off-farm work information as our sample. After combining the village-level data with the individual data and eliminating invalid data, our database contained a total of 20,791 observations (10,196 individuals from CHIP2007 and 10,595 individuals from CHIP2008). In this study, we constructed our dependent variables based on the information from the CHIP individual data. There are two dependent variables in this study. First, income from off-farm work was calculated based on the repondents’ answer to the questionnaire item "What is the average monthly total income from all paid off-farm work (including self-employment)?”. Second, intention to migrate out for work was measured based on the repondents’ answer to the questionnaire item "Are you planning to migrate out for work or business in the near future?”.
In the village-level data from CHIP datasets, they reported whether each village had road improvement projects in 2005, 2006, 2007, and 2008. In addition, they documented the specific source of funding for road improvement projects in each village in 2007 and 2008. Moreover, they divided the source of project funds into self-financing by the village and external funds (higher level governments’ grants and other funds). Based on the above information, in our study we classified whether the village had road improvement projects into the following four categories to construct our treatment variables: 1) received no fund of road improvement projects; 2) had road improvement projects with project funding from external funding only; 3) had road improvement projects with project funding from external funding and village self-financing; and 4) had road improvement projects with project funding from village self-financing only.
Intuitively, several factors affect the rural residents’ off-farm work; thus, the features of the individual and village need to be controlled. The variables for individual-level control include gender, level of education, age, ethnicity, health status, marital status, and household size. In rural areas, residents over 16 years old are the primary decision makers, so their demographics should greatly affect their off-farm work decisions. This is the theoretical basis behind our controlling for these factors in our empirical model.
We also control several variables for the village, which include the population, percentage share of the labor force working in the agriculture sector in the village, average income level, the money earned in a day as a temporary worker in the village, whether a natural disaster occurred in the respective year and the percentage of losses caused by the disasters compared to the previous year's agricultural production, and the distance from the village to the nearest township government. As a society that relies heavily on social capital, the village population can reflect the likelihood that a person will have access to acquaintances to help with off-farm work. The percentage of the village labor force working in the agricultural sector reflects the structure of the rural economy and whether residents are dependent on agricultural work. The distance from the village to the nearest township government reflects the feasibility and difficulty of going to the township for off-farm work. The money earned in a day as a temporary worker in the village provides an indication of the degree of market development in the village. The average income level of the village reflects the state of the economic development of households in the community, while the occurrence of natural disasters temproralily affects the income.
As shown in Table 1, we divided the villagers into a treatment group and a control group depending on whether they belonged to the villages with specific financed road projects in 2007. For example, those belonging to the villages with externally funded road projects were placed in the treatment group, and those without were placed in the control group. The average monthly off-farm income of rural residents in the treatment group in 2007 was 1,363 yuan, while that of the control group was 1,397 yuan. In the 2007 group, 660 rural residents belonged to the treatment group and 7,686 belonged to the control group, and the treatment group accounted for 7.9% of the total. In the 2008 group, the rural residents were also divided into treatment and control groups on the basis of whether they belonged to a villages with externally funded road projects in 2007. The average monthly off-farm income of rural residents in the treatment group in 2008 increased to an average of 1,499 yuan, while that of the control group increased to an average of 1,487 yuan. The treatment group contained 631 rural residents, and the control group contained 8,035, with the treatment group accounting for 7.3%.
Table 1
|
Descriptive statistics for monthly off-farm income and intention to migrate for work.
|
Variables
|
2007
|
|
2008
|
|
|
Treated
|
Control
|
Treated
|
Control
|
Externally funded road projects
|
Off-farm income
|
1363
|
1397
|
1499
|
1487
|
Off-farm income (logarithm)
|
7.040
|
7.044
|
7.136
|
7.085
|
Observation
|
660
|
7,686
|
631
|
8,035
|
Intention to migrate for work
|
0.257
|
0.254
|
0.227
|
0.187
|
Observation
|
447
|
5,293
|
401
|
5,019
|
Self financed road projects
|
Off-farm income
|
1456
|
1382
|
1576
|
1471
|
Off-farm income (logarithm)
|
7.029
|
7.047
|
7.147
|
7.077
|
Observation
|
1,326
|
7,020
|
1,427
|
7,239
|
Intention to migrate for work
|
0.170
|
0.272
|
0.124
|
0.203
|
Observation
|
1,010
|
4,730
|
917
|
4,503
|
Mix financed road Project
|
Off-farm income
|
1366
|
1407
|
1407
|
1526
|
Off-farm income (logarithm)
|
7.040
|
7.046
|
7.068
|
7.098
|
Observation
|
2,676
|
5,670
|
2,786
|
5,880
|
Intention to migrate for work
|
0.246
|
0.258
|
0.204
|
0.183
|
Observation
|
1,838
|
3,902
|
1700
|
3,720
|
Source: CHIP2007/2008.
|
Note: The observations are different for the two years because the data are not balanced panel data and there are some missing observations and new observations.
|
Table 1 presents the descriptive statistics for the dependent variables in the treatment and control groups. The descriptive statistics for the other control variables are presented in Table 2. Two of the control variables, namely the average household income in the village and the distance from the village to the nearest township government, are ordinal variables. The survey question used to collect data on the average household income in the village was "In 2007, which of the following ranges did the per capita annual net income of farmers in this village fall into?" Respondents were presented with 19 response options, ranging from "Below 500 yuan" to "Above 20,000 yuan". The survey question used to collect data on the distance from the village to the nearest township government was "Based on the usual mode of travel, approximately how long does it take to travel from your village to the nearest seat of township government?" Respondents were provided with 5 response options, ranging from "Within 15 minutes" to "More than 90 minutes".
Table 2
|
Descriptive statistics of control variables.
|
Variables
|
Observation
|
Mean
|
SD
|
Minimum
|
Maximum
|
Individual level
|
|
|
|
|
|
Gender (male=1)
|
20787
|
.628
|
.483
|
0
|
1
|
Education
|
20791
|
4.282
|
4.626
|
0
|
31
|
Age
|
20775
|
36.044
|
12.254
|
17
|
97
|
Ethnicity (minorities=1)
|
20791
|
.008
|
.09
|
0
|
1
|
Health status (level)
|
20783
|
1.849
|
.703
|
1
|
5
|
Marriage states
|
20771
|
2.358
|
2.189
|
1
|
6
|
Household size
|
20791
|
4.489
|
1.454
|
1
|
18
|
Experience of migrated for work (yes=1)
|
20791
|
.618
|
.486
|
0
|
1
|
Village level
|
|
|
|
|
|
Average household income in the village (level)
|
20782
|
11.648
|
3.453
|
1
|
18
|
Population (logarithm)
|
20791
|
7.668
|
.642
|
5.209
|
10.535
|
Labor force working in agriculture sector (%)
|
20791
|
45.679
|
24.648
|
0
|
99
|
The money earned in a day as a casual worker in the village
|
20791
|
43.869
|
13.375
|
0
|
120
|
Distance from the village to the nearest township government (level)
|
20791
|
1.825
|
.81
|
1
|
5
|
Interaction term of natural disaster occurrence in the respective year and agricultural production lose (%)
|
20664
|
7.784
|
11.92
|
0
|
90
|
Source: CHIP2007/2008.
|
3.2 Econometric model specification
We specified the empirical model below:
$$\:{\:\:\:\:\:\:\:\:\:\varvec{Y}}_{\varvec{i},\varvec{c},\varvec{t}}={\varvec{\beta\:}}_{0}+{\varvec{\beta\:}}_{1}{\varvec{A}}_{\varvec{t}}+{\varvec{\beta\:}}_{2}{\varvec{D}}_{\varvec{r},\varvec{c}}+{\varvec{\beta\:}}_{3}\left({\varvec{A}}_{\varvec{t}}\cdot\:{\varvec{D}}_{\varvec{r},\varvec{c}}\right)+{\varvec{\beta\:}}_{4}^{{\prime\:}}{\varvec{z}}_{\varvec{i},\varvec{c},\varvec{t}}+{\varvec{\beta\:}}_{5}^{{\prime\:}}{\varvec{I}}_{\varvec{c},\varvec{t}}+{\varvec{v}}_{\varvec{p}}+\:{\varvec{e}}_{\varvec{i},\varvec{c},\varvec{t}}$$
for i = 1,..., N; t = 0,1; c = 0,…,V;and r = 1,2,3 (1)
The dependent variable Yict in Eq. (1) represents the monthly off-farm income or intention to migrate out for work for the ith rural resident in the cth village at time t. Dr,c is a dummy variable for the treatment and control groups, indicating whether rural resident i in village c received the rural road improvement project financed by specific fund r. When the sample belongs to the treatment group, this variable takes the value of 1. According to the CHIP database, we can classify Dr,c into three categories: D1,c, the villages with only externally funded road improvement projects (EFRIP) were assigned a value of 1, and those without were assigned a value of 0. D2,c, the villages with externally funds and self finance (mix financed) road improvement projects (MFRIP) were assigned a value of 1, and those without were assigned a value of 0. D3,c, the villages with only self finance road improvement projects (SFRIP) were assigned a value of 1, and those without were assigned a value of 0. We chose to treat a particular funded road project as the treatment group and to place other categories in the control group, rather than including no road projects as a control group. The reason is to avoid selection bias.
Out of the 652 villages included in the CHIP20072, 52 received a rural road improvement project financed only by external funds in 2007, which is about 8%. This percentage was 3.5% and 3.9% in 2005 and 2006, respectively. 101 villages received a only self financed rural road improvement project in 2007, which is about 15.5%. This percentage was 28.1% and 24.0% in 2005 and 2006, respectively. 207 villages received a mix financed (self financed and externally funded) rural road improvement project in 2007, which is about 31.8%. This percentage was 20.9% and 22.5% in 2005 and 2006, respectively. 292 villages3 received no rural road improvement project or no funding for a rural road improvement project in 2007, which is about 44.8%. This percentage was 47.4% and 49.6% in 2005 and 2006, respectively. From 2005 to 2007, the percentage of villages with road improvement projects first decreased and then increased. Among them, the external funded road improvement projects got doubled, while the rate of villages with MFRIP had a significant increase. However, the percentage of villages with SFRIP decreased significantly. The increase of EFRIP and MFRIP in the two years was likely due to the implementation of the RRCP.
At is the year dummy variable. When the sample belongs to the follow-up period (2008), this variable takes the value of 1. In the final empirical analysis, we primarily focus on the coefficient of the interaction term Dr,c × At. This coefficient represents the effect of the implementation of the rural road improvement project. As our data contains only two time periods, under the model established in Eq. (1), we use ordinary least squares (OLS), fixed-effects (FE) and logit regressions models to obtain unbiased and consistent regression results. Table 2. shows the descriptive statistics for the important control variables. Zict and Ict are the individual-level and village-level control variables mentioned previously.
In addition to the above control variables, we include a dummy variable vp for each observation's province in our model; this controls for the unobservable effects related to different provinces and administrative regions. In addition, eict is the error term in the model.
In our DID analysis, because we only have a two-year sample, we cannot test the parallel hypothesis using a general method, such as plotting the dependent variable before the treatment. Thus, we need another way to ensure the comparability between the treatment and control groups in the base period. We found a significant difference in the features of households and villages between the treatment group and control group. This may be due to specific differences between individuals and villages. It may also be due self-selected bias; after the implementation of the rural road improvement project, rural households could choose to remain in or leave their village.
To deal with this difference in features, we conduct a PSM-DID analysis, the results from which are as follows:
PSM - DID = E[ \(\:{\varvec{Y}}_{1\varvec{i}\varvec{c}}^{\varvec{T}}-{\varvec{Y}}_{0\varvec{i}\varvec{c}}^{\varvec{T}}\left|\:\varvec{p}\right({\varvec{I}}_{0\varvec{c}}\) ), D r,c =1]- E[\(\:{\varvec{Y}}_{1\varvec{i}\varvec{c}}^{\varvec{C}}-{\varvec{Y}}_{0\varvec{i}\varvec{c}}^{\varvec{C}}\left|\:\varvec{p}\right({\varvec{I}}_{0\varvec{c}}\)), Dr,c =0] (2)
Here, \(\:{Y}_{1ic}^{T}\:\)and \(\:{Y}_{0ic}^{T}\) are the dependent variables of the treatment group in the base and follow-up periods, respectively; \(\:{Y}_{1ic}^{C}\) and \(\:{Y}_{0ic}^{C}\) are the dependent variables of the control group in base and follow-up periods, respectively. P(\(\:{I}_{0c}\)) is the propensity score in the base period. The initial observed covariates from village level \(\:{I}_{0c}\) in 2007 are used to predict the probability of whether the samples belong to the treatment group. We use logit models to conduct the regression and use the regression results to calculate the conditional probability of samples entering the treatment group; this conditional probability is the propensity score P(\(\:{I}_{0c}\)). Once the propensity scores are estimated, we employ radius matching, to create a matched sample of treated and control units with similar propensity scores. With the matched sample in hand, we apply a DID analysis to estimate the treatment effect, incorporating the propensity score weights obtained from the matching.
Since CHIP2007 and CHIP2008 are unbalanced panel data sets4, we also use fixed effects DID estimates to exploit the panel data structure. we apply an within transformation as follows:
$$\:{\varvec{Y}}_{\varvec{i},\varvec{c},\varvec{t}}-{\stackrel{-}{\:\varvec{Y}}}_{\varvec{i},\varvec{c}}={\varvec{\beta\:}\varvec{{\prime\:}}}_{1}\left({\varvec{A}}_{\varvec{t}}-{\stackrel{-}{\varvec{A}}}_{\varvec{t}}\right)+{{\varvec{\beta\:}\varvec{{\prime\:}}}_{2}\left({\varvec{D}}_{\varvec{r},\varvec{c}}-{\stackrel{-}{\varvec{D}}}_{\varvec{r},\varvec{c}}\right)+\:\varvec{\beta\:}\varvec{{\prime\:}}}_{3}\left({\varvec{A}}_{\varvec{t}}\cdot\:{\varvec{D}}_{\varvec{r},\varvec{c}}-{\stackrel{-}{\varvec{A}}}_{\varvec{t}}\cdot\:{\stackrel{-}{\varvec{D}}}_{\varvec{r},\varvec{c}}\right)+{\varvec{\beta\:}}_{4}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\left({\varvec{z}\varvec{{\prime\:}}}_{\varvec{i},\varvec{c},\varvec{t}}-{\stackrel{-}{\varvec{z}\varvec{{\prime\:}}}}_{\varvec{i},\varvec{c}}\right)+{\varvec{\beta\:}}_{5}^{\varvec{{\prime\:}}\varvec{{\prime\:}}}\left({\varvec{I}\varvec{{\prime\:}}}_{\varvec{c},\varvec{t}}-{\stackrel{-}{\varvec{I}\varvec{{\prime\:}}}}_{\varvec{c}}\right)+\:{(\varvec{e}}_{\varvec{i},\varvec{c},\varvec{t}}-{\stackrel{-}{\varvec{e}}}_{\varvec{i},\varvec{c}})$$
3
Where Z\(\:\varvec{{\prime\:}}\)ict and I\(\:\varvec{{\prime\:}}\)ct are the time-varying individual-level and village-level control variables, respectively.\(\:\:{\stackrel{-}{\:Y}}_{i,c}\), \(\:{\stackrel{-}{A}}_{t}\), \(\:{\stackrel{-}{D}}_{r,c}\), \(\:{\stackrel{-}{z{\prime\:}}}_{i,c}\), \(\:{\stackrel{-}{I{\prime\:}}}_{c}\) and \(\:{\stackrel{-}{e}}_{i,c}\) are the time means of the respective variables for individual i in village c. Eq. (3) could control for unobserved time-invariant individual-specific characteristics, and common shocks that are assumed to be derived mainly from macroeconomic policies and institutional reforms.