2.1 Data
The data used in this study came from the 2017 China Migrants Dynamic Survey (CMDS), a nationally representative survey of migrants aged 15-59 [19,20], which was conducted by the National Health Commission of China. This database collected internal migrants’ demographic characteristics, health status, self-reported health, and whether individuals have established health records. In this study, those who migrated less than six months were excluded as they are not qualified for NBPHS. The final sample included 150,384 observations, among which 45,240 internal migrants had health records while 105,144 internal migrants had no health records.
2.2 Health outcome variables
The outcome variables consist of two recognized measures of health status, including self-reported health, and whether you have been ill in the past year. Self-reported health is a four-point scale measurement from 1 (unhealthy) to 4 (very healthy). For the latter health indicator, the question is “have you been ill (injured) or unwell in the past year”, and there are three choices: “Yes, the last one happened in two weeks”, “Yes, the last one happened two weeks ago” and “No”. The first two answers are summarized as having illness in the past year and equal to 1. The third answer means no illness in the past year and equals 0.
2.3 Core independent variables
As previously noted, the establishment of health records is the basis for receiving the treatment of NBPHS. Thus, a binary variable of whether health records have been established was used to measure the enrollment.
The core independent variable in this study is a binary variable: whether health records have been established or not. In the CMDS, the question is “whether the health record has been established for you locally?”, and there are four answers: (1) Yes, it has been established. (2) It has not been established and I never heard about it. (3) It has not been established but I’ve heard about it. (4) I’m not clear about it. The outcome variable of health records is a binary indicator that takes a value equal to 1 if individual reports to have established health records, otherwise, it is equal to 0.
2.4 Covariates
The covariates can be divided into three categories: personal, household, and social characteristics. Personal characteristics include gender, hukou, age, migration time, marital status, education, range of migration, willingness of staying, migration reason, and career. Household and social characteristics include household income, the status of house property, the time needed to reach the nearest medical facility, the number of people living together, and the city level.
Hukou is a binary variable, equal to 1 if registered in the town, county, or other city levels; equal to 0 if registered in the countryside. Marital status is divided into three categories: not married, married, and divorced/widowed. Education is an ordered variable with seven dimensions: below elementary school, elementary school, middle school, high school/technical secondary school, junior college, undergraduate college, and graduate. Seven reasons for migration are included: migration for working, family (relocation, marriage, birth), friends (for help in some way), study, joining the army, providing for the aged, and other reasons. The range of migration is a variable with three categories: equal to 1 if across the county in the city, equal to 2 if across the city in the province, and equal to 3 if across the province. The willingness of staying is a binary variable that takes a value equal to 1 if migrants wish to remain in the locality and equal to 0 if else. There are seven categories of career and three categories of the city level as listed in Table 1.
Table 1 presents the descriptive statistics. On average, individuals with health records have better self-reported health and are less likely to get ill in the past year. Health outcomes differ significantly between the two groups. There are significant differences except for migration time, other migration reasons, and the career of businessmen. On average, enrolled internal migrants are older, with a higher probability of married, higher education, higher willingness of staying, more people living together, and lower household income. There are also several disparities for migration reasons, career, and regions.
2.5 Empirical strategy
While using non-experimental data, the matching method is one possible way to solve the selection problem that may lead to the selection of treatment [21]. Propensity scores match (PSM) is suggested to reduce multiple observable characteristics to a single dimension [22]. According to the pre-treatment characteristics, the basic idea is to find a large group of individuals in the control group who are similar to the individuals in the treatment group [21]. This means the pairing of treated and untreated internal migrants who are comparable but have different treatment statuses.
The propensity score was generated first, which implies the probability of having a health record that represents whether an individual can enjoy the NBPHS. Individuals whose propensity score is not in the range of propensity score were excluded in the subsequent research. According to previous studies, [12.23] the logit regression was used to estimate the probability of enrollment via the equation (1).

HRi represents whether an individual has established health records, meaning whether the individual can be enrolled in NBPHS, Xpi is a vector of personal characteristics, Xhi is a vector of household characteristics and Xsi is a vector of social characteristics as depicted above.
2.6 Matching algorithms
Four different matching algorithms of PSM were used to ensure the robustness of the results. First, 1:1 and 1:4 nearest neighbor (NN) matching with replacement was used in this study, which means that the closest one and four partners are chosen from the comparison group to match a treated individual, respectively. Second, radius matching was used, where the basic idea is to use all of the members from the control group within the radius, and more individuals can be used for matching to help avoid the risk of poor matching [24]. In the present study, the caliper is set at 0.1 for each treated individual, and this study selected matched individuals from 10% differences of propensity scores. Third, the results of kernel matching (KM) were also reported in this study. By using kernel matching, the counterfactual outcome was constructed using weighted averages of all members in the control group. More information is used and its advantage is the lower variance [21].