Descriptive statistics
In this study, a descriptive statistical analysis was performed on the explanatory variables, core explanatory variables, factor indicator system, control variables, mediating variables, and instrumental variables, as illustrated in Table 3. Our analysis revealed that the numerical divide exhibited a minimum value of 3.01, a maximum value of 96.26, and an average value of 35.53. These findings suggest a significant degree of numerical divide within the sample data, characterized by a broad range of values and a relatively low mean.
Table 3
Descriptive statistical analysis.
Variable Name
|
N
|
Mean
|
SD
|
Min
|
Max
|
Mainwork
|
11965
|
0.633
|
0.482
|
0
|
1
|
Dig
|
11965
|
35.53
|
11.78
|
3.01
|
96.26
|
Phone
|
11965
|
0.954
|
0.211
|
0
|
1
|
Eaccount
|
11965
|
0.925
|
0.263
|
0
|
1
|
Internetfee
|
11965
|
5.346
|
0.788
|
0
|
8.517
|
Onlinefee
|
11965
|
7.876
|
1.375
|
2.303
|
12.79
|
Onlinefre
|
11965
|
6.828
|
1.832
|
1
|
10
|
Age
|
11965
|
39.29
|
9.866
|
18
|
60
|
Sex
|
11965
|
0.572
|
0.495
|
0
|
1
|
Edu
|
11965
|
4.77
|
1.784
|
1
|
9
|
Health
|
11965
|
2.235
|
0.821
|
1
|
5
|
Marri
|
11965
|
0.825
|
0.38
|
0
|
1
|
Familynumb
|
11965
|
3.91
|
1.459
|
1
|
15
|
House
|
11965
|
0.925
|
0.263
|
0
|
1
|
ROA
|
11965
|
0.431
|
0.439
|
0
|
13.29
|
Infor
|
11951
|
2.134
|
0.999
|
1
|
5
|
Risk
|
10941
|
2.157
|
1.117
|
1
|
5
|
Internet
|
11965
|
0.325
|
0.051
|
0.232
|
0.436
|
Cdig
|
11965
|
36.36
|
6.553
|
21.91
|
81.5
|
Baseline regression and endogeneity test
Benchmark regression.Columns (1) and (2) of Table 4 present the estimated results of the digital divide's impact on the nature of employment, respectively. Upon introducing control variables in column (2) of Table 4, the digital divide's marginal effect on formal employment is approximately − 0.156%, significant at the 1% level. This suggests that the digital divide diminishes the likelihood of workers transitioning into formal employment channels, thereby validating the Hypothesis H1 proposed in this study.
In addition, the coefficient of the age of the household head is positive and significant at the 1% level, indicating that the probability of formal employment increases with the age of the worker, which may be because the social experience of the worker increases with age, which will help the worker to be properly employed; gender does not affect the nature of individual employment; the higher the level of education, the higher the probability of choosing formal employment, which is because education tends to be closely related to human capital is closely related to human capital; health is also an important human capital[25], and the regression results show that the coefficient of health is significantly negative, indicating that the healthier the worker is, the higher the probability of formal employment; being married makes people more inclined to look for stable jobs, so the probability of formal employment is higher for married workers. Household size significantly reduces the probability of formal employment, which may be since household size includes young children and elderly parents, a group that requires more flexible time for caregiving. Workers choose informal employment to balance work and family. The higher the household gearing ratio, the lower the probability of workers entering formal employment.
Table 4
Basis regression and endogeneity test.
Variables
|
(1)
|
(2)
|
(3)
|
Probit
|
Probit
|
IVProbit
|
Dig
|
-0.030***
|
-0.016***
|
-0.038***
|
|
(0.001)
|
(0.001)
|
(15.915)
|
Age
|
|
0.013***
|
0.013***
|
|
|
(0.002)
|
(8.049)
|
Sex
|
|
-0.023
|
0.001
|
|
|
(0.027)
|
(0.022)
|
Edu
|
|
0.396***
|
0.334***
|
|
|
(0.009)
|
(29.674)
|
Health
|
|
-0.052***
|
-0.032*
|
|
|
(0.016)
|
(1.958)
|
Marri
|
|
0.209***
|
0.095**
|
|
|
(0.041)
|
(2.292)
|
Familynumb
|
|
-0.095***
|
-0.090***
|
|
|
(0.010)
|
(9.276)
|
House
|
|
0.187***
|
0.206***
|
|
|
(0.050)
|
(4.281)
|
ROA
|
|
-0.083***
|
-0.135***
|
|
|
(0.030)
|
(4.415)
|
Constant
|
1.411***
|
-1.190***
|
-0.102
|
|
(0.040)
|
(0.118)
|
(0.645)
|
Obs
|
11965
|
11965
|
11,965
|
Pseudo R2
|
0.054
|
0.227
|
0.231
|
Phase I F-value
|
|
|
513.72
|
Wald/DWH test
|
|
|
106.22
|
Note: ***, **, * indicate significant at the 1%, 5%, and 10% levels, respectively, and standard errors are in parentheses. |
Endogeneity test. The regression results from the benchmark model above might be subject to endogeneity issues arising from reciprocal causality and omitted variables. First, concerning reciprocal causality: on one hand, enhanced digital skills are crucial for workers to better integrate into contemporary society and navigate technological innovations that characterize modern societal development, thus preventing loss of formal employment opportunities. Conversely, formal employment also necessitates workers to possess superior digital skills. As modern society progresses rapidly, digital technology becomes increasingly pervasive. Workers consistently augment their knowledge of digital information technology, such as the Internet, whereas for those in informal employment groups, the digital divide continues to widen.Secondly, there's the issue of omitted variables. Although this study controls for attributes like age and gender, workers' employment decisions might still be influenced by unobservable variables, such as personality traits and upbringing.
For this reason, this paper adopts an instrumental variables approach to address endogeneity. Existing literature on the digital divide at the individual level is sparse, and individual behavioral decisions are studied based on individual digital divide gaps, whether or not they use the Internet, and whether or not they use mobile payments, etc. The literature addresses endogeneity using instrumental variables such as provincial Internet penetration[26], county-level digital divide, community-level monthly household postal and telecommunications costs[27],and smartphones[28].
Considering that the digital divide indicator system in this paper already contains indicators such as "smartphone," "Internet access cost," and "electronic payment account," this paper, based on the combination of scholars' studies, The provincial Internet penetration rate and the community-level digital divide (in addition to the workers themselves) are selected as the instrumental variables. On the one hand, the faster the Internet penetration rate, the higher the probability of workers using the Internet, and the higher the digital divide gap among workers may be, while the individual-level digital divide among workers is easily influenced by the surrounding community, thus satisfying the correlation hypothesis. On the other hand, the Internet penetration rate uses macro-level data, while the community-level digital divide level excludes workers' influencing factors, thus satisfying the homogeneity hypothesis. The paper concludes that the selected instrumental variables are theoretically feasible.
The results from the first-stage regression (Table 3, Column (3)) indicate that the F-statistic is significantly higher than ten. The p-values of CLR, K-J, AR, and Wald are all statistically significant at the 1% level, allowing us to reject the null hypothesis that "weak instrumental variables exist"[29] This suggests that the instrumental variables chosen for this study are strongly correlated with the endogenous variables.The second-stage regression results demonstrate a significantly positive coefficient of digital literacy (Table 3, Column (3)), substantially higher than the coefficient of the baseline regression (Table 3, Column (2)). This implies that the negative impact of the digital divide on formal employment is likely to be underestimated if endogeneity is not taken into account. Furthermore, the DWH endogeneity test results indicate that we can reject the hypothesis of "all variables being exogenous" at the 1% statistical significance level.In conclusion, the selected instrumental variables are appropriate, and the study's findings remain robust.
Robustness test
In this section, variables are designed from different perspectives for robustness analysis to test the findings' reliability further. This paper replaces the core explanatory variables with a digital divide dummy (Dig_dummy) and digital divide type (Dig_type).
Firstly, if a household head has a negative option in the above factor model (i.e., no smartphone, no third-party payment account, did not incur Internet costs last year, did not make online purchases last year), then the digital divide dummy variable (Dig_dummy) is defined as 1; otherwise, it is 0. Secondly, the above factor model indicators are set as dummy variables and summed. If the household head fully enjoys the digital dividend (i.e., has a smartphone, has opened a third-party payment account, has incurred Internet costs last year, has made online purchases last year), then the digital divide type (Dig_type) is 0. If the household head does not enjoy the digital dividend at all (i.e., does not have a smartphone, has not opened a third-party payment account, has not incurred Internet costs last year, has not made online purchases last year), then the digital divide type (Dig_type) is 5. The digital divide type from 1 to 5 represents the degree of severity of the digital divide faced by households. Finally, this paper matches the sample data with the Digital Financial Inclusion Index of Peking University (Dig_PK), which is limited to the range of [0,100], takes negative numbers for processing, and conducts the digital divide robustness test at the provincial level. The regression results are shown in Table 5. After a series of robustness tests, we find that the digital divide reduces the probability of formal employment of workers, and the conclusions of this paper still hold.
Table 5
Variables
|
Mainwork
|
(1)
|
(2)
|
(3)
|
Dig_dummy
|
-0.208***
|
|
|
|
(0.042)
|
|
|
Dig_type
|
|
-0.160***
|
|
|
|
(0.034)
|
|
Dig_PK
|
|
|
-0.075***
|
|
|
|
(0.006)
|
Controls
|
YES
|
YES
|
YES
|
Obs.
|
11965
|
11965
|
11965
|
Pseudo R2
|
0.217
|
0.217
|
0.226
|
Mechanism analysis
Drawing on the steps of mediating effect analysis by J. Fang et al. (2017) [30] and C. L. Wen et al. (2014) [31], the following mediating effect analysis is further done for information accessibility and risk tolerance, respectively, and the models are set as in (3) and (4).
$$E{M}_{i}={{\theta }}_{1}+{{\theta }}_{2}Di{g}_{i}+{{\theta }}_{3}Control{s}_{i}+{{\xi }}_{i}$$
3
$${Mainwork}_{i}={{\lambda }}_{1}+{{\lambda }}_{2}Di{g}_{i}+{{\lambda }}_{3}E{M}_{i}+{{\lambda }}_{4}Control{s}_{i}+{\text{ϵ}}_{i}$$
4
where \(E{M}_{i}\) represents the mediating variable. First, the relationship between the digital divide and the mediating variables is tested, and then whether there is still a significant relationship between the digital divide and formal employment by introducing mediating variables based on the original model (1).
Information accessibility
As shown in column (1) of Table 6, the digital divide significantly reduces people's information accessibility, and column (2) indicates that when the model controls for information accessibility, the digital divide still has a significant inhibitory effect on formal employment, with this mediating effect of 5.46%. The regression results confirm hypothesis H2a.
Table 6
Analysis of the mediating effect of information access.
|
(1)
|
(2)
|
Variable
|
Infor
|
Mainwork
|
Dig
|
-0.013***
|
-0.015***
|
|
(0.001)
|
(0.001)
|
Infor
|
|
0.063***
|
|
|
(0.014)
|
Controls
|
YES
|
YES
|
Obs.
|
11951
|
11951
|
Pseudo R2
|
0.038
|
0.228
|
Note: ***, **, * indicate significant at 1%, 5%, 10% level respectively, standard error in parentheses |
Risk tolerance.
As shown in column (1) of Table 7, the digital divide significantly reduces people's risk tolerance, and column (2) indicates that when controlling for risk tolerance, the digital divide still has a significant inhibitory effect on the propensity to formal employment, with a mediating effect of 5.87%. The regression results confirm hypothesis H2b.
Table 7
Analysis of mediating effects of risk preferences.
|
(1)
|
(2)
|
Variable
|
Risk
|
Mainwork
|
Dig
|
-0.020***
|
-0.015***
|
|
(0.001)
|
(0.001)
|
Risk
|
|
0.044***
|
|
|
(0.013)
|
Controls
|
YES
|
YES
|
Obs.
|
10941
|
10941
|
Pseudo R2
|
0.027
|
0.222
|
Note: ***, **, * indicate significant at 1%, 5%, 10% level respectively, standard error in parentheses |