Data source
The main data set used in this paper is the China Education Panel Study (CEPS) 2013, which is a school-based study. It covers both rural and urban areas in mainland China and is a nationally representative survey that applies a stratified, multistage sampling design with probability proportional to size. The CEPS has a unique feature whereby once a classroom is selected for the sample, all students attending the class and their parents are surveyed. This key feature allows us to obtain a full picture of classmates’ emotion. The CEPS 2013 surveys students who were in grade 7 and grade 9 during the 2013 academic year. The academic year in China starts in September and ends in June of the following year. Students in grade 7 are generally 12-13 years old.
In addition to the 2013 wave, the CEPS also conducted a follow-up survey in 2014. The 2014 wave followed the students who were grade 7 in the 2013 wave and became grade 8 in 2014. The CEPS 2014 is used as a supplementary dataset in our analysis. This is because several key variables either did not change over time (such as our main IV, illness before primary school) or were not surveyed in the 2014 wave (such as our alternative IV, parental health outcome). Hence, we cannot use the panel feature in the regression analysis using these variables. We will, however, use the panel feature in Section 4.4, in which parental conflict is used as an alternative IV.
All school-age children in China are entitled to a free and compulsory 9-year education by law. Grade 7 is the first year of middle school education and Grade 9 is the last year. Most middle schools and primary schools in China are public schools (93% of sample schools are public in CEPS), and students are assigned to public schools based on their residence location. Typically, students come to class in the morning and take all courses with the same classmates in the classroom throughout the day. Classrooms are usually fixed through an academic year, and students interact frequently within a classroom.
Regression samples
Most Chinese middle schools randomly assign students to classrooms at the beginning of the 7th grade, and many of them keep the assignment through the 9th grade, to ensure equal and fair educational opportunities at the level of compulsory education. We focus on schools that randomly assign students at the beginning of grade 7 and no reassignment is made in grade 8 or grade 9 in the 2013 academic year. There are in total 109 schools recorded in the CEPS 2013 data. According to the school principal survey of the CEPS, 93 schools report that they apply a random class assignment policy to grade 7 students. Of these 93 schools, 78 reported that they did not rearrange classrooms for grade 8 and grade 9. We also drop one school that only had one classroom in the sample, because we cannot carry out the within-school comparison in a school with only one classroom. Based on these criteria, the cross-sectional sample consists of 12,677 students in 301 classrooms in 77 schools. Each school has two to four classrooms.
Variables and summary statistics
Negative emotion
Table 3 reports summary statistics of variables used in the sample. Our main outcome variable is a negative emotion: unhappiness. It is measured by the following survey question: “Did you feel unhappy in the last seven days?” Answers are rated on a 5-point Likert scale (1=never, 2=seldom, 3=sometimes, 4=often, 5=always). More than 50% of the students reported either “seldom” or “sometimes” feeling unhappy. The mean of individual-level unhappiness is 2.31, with a standard deviation of 1.04; the mean of classmates’ unhappiness is the same as that of the individual level, 2.31, with a smaller standard deviation of 0.29. The distribution of these two variables is presented in Figure 1. In addition to unhappiness, the survey also asked the same questions for three similar negative emotions: “not joyful,” “sad,” and “stressed.” We present the regression analysis for unhappiness in the main results and report the regressions for other negative emotions in the robustness checks reported in the Supplementary Information file (Table S6).
Table 3. Summary Statistics
|
|
|
Variable
|
Mean
|
SD
|
Individual Characteristics
|
|
|
Unhappiness
|
2.31
|
1.04
|
Illness
|
0.14
|
0.35
|
Parental Conflict
|
0.10
|
0.30
|
Parental Health
|
3.85
|
0.91
|
Wealth
|
3.01
|
0.55
|
Age
|
13.95
|
1.36
|
Female
|
0.48
|
0.50
|
Minority
|
0.12
|
0.32
|
One Child
|
0.46
|
0.50
|
Grade 9
|
0.46
|
0.50
|
Father’s Years of Schooling
|
10.23
|
3.52
|
Mother’s Years of Schooling
|
9.50
|
3.84
|
Peer Characteristics
|
|
|
Classmates’ Unhappiness
|
2.31
|
0.29
|
Classmates’ Illness
|
0.14
|
0.08
|
Classmates’ Parental Conflict
|
0.10
|
0.06
|
Classmates’ Parental Health
|
3.85
|
0.29
|
Classroom Characteristics
|
|
|
Classroom Size
|
47.57
|
13.78
|
Percentage of Female
|
0.48
|
0.09
|
Teacher Characteristics
|
|
|
Female
|
0.63
|
0.48
|
Age
|
37.02
|
7.52
|
Years of Schooling
|
15.93
|
0.46
|
Years of Experience
|
15.92
|
8.09
|
Observations
|
12,677
|
Number of Classrooms
|
301
|
Number of Schools
|
77
|
Data: CEPS 2013.
|
|
|
Illness
The variable illness is measured using the following survey question in the CEPS’s 2013 wave: “Did you have a serious illness before you started elementary school?” About 14% of students answered “Yes,” with a standard deviation of 0.08. Figure 2 presents the distribution of the proportion of students who answered “Yes” per classroom. Our alternative measure of illness uses a similar survey question, which is answered by students’ parents in the parent survey of the CEPS 2013: “Did your child have a serious illness before the child started elementary school?” About 9% of parents in the sample answered “Yes,” with a standard deviation of 0.08. The parent-reported measure of illness is smaller than the student’s self-reported illness. This is likely to be caused by parents’ and children’s different definitions of serious illness. Children may also have different perceptions of some illnesses compared with their parents. We find that compared with the parent-reported measure, the student self-reported measure is more closely correlated with the student’s own emotion, and therefore yields a stronger first-stage result (Table 2 vs Table S4).
A slightly different question regarding children’ illness is posed in the CEPS’s parent survey in 2014. The survey asks parents to report whether their children had the following types of serious illness and, if so, when the child had it: heart, brain, limb, kidney and lung. In total, 4.2% of parents reported that their child had at least one of these diseases before age 7: heart (0.3%); brain (0.2%); limb (2.1%); kidney (0.2%); and lung (1.9%).
Control variables
Of the student characteristics, family wealth is measured using students’ self-reported family financial conditions on a 5-point scale that ranges from 1=very poor to 5=very rich. Students are on average 14 years old; 48% are female; 12% are minorities; 46% are One Child, meaning that they don’t have siblings; 46% are in grade 9; and 54% are in grade 7 (Table 4). Respondents’ fathers have on average 10.23 years of education, and mothers have 9.5 years. On average, each classroom has 47 students and 48% are female. Teachers on average are 37 years old; have about 16 years of education and 16 years of experience; and 63% of teachers are female.
Empirical strategy
The main difficulties in identifying the causal effect of peers’ emotion on an individual’s emotion are self-selection, common shocks, and the reflection issue. Self-selection (also called “sorting” or “homophily”) is the tendency of like to attract like: In the case of school children, we might observe that both a student and her classmates are happy only as a result of sorting. For example, students (or their parents) who care more about their well-being may self-select into a school or classroom in which most students are well-behaved and interact in a friendly manner. The common shocks issue (also called “contextual effects”) arises because individuals and their social contacts are affected by a common environment. The common environment for school children could be their teacher’s character or disposition. Hence, individuals’ emotions could be correlated even without one affecting another. The size of the causal effect of social influence is difficult to identify, because individuals simultaneously affect each other; this problem is called the reflection issue69. In this section, we describe our empirical strategy for addressing each.
To address the self-selection issue, we restrict our sample to schools in which students are randomly assigned to classrooms in grade 7—the first year of middle school—and no further reassignment is made in grades 8 or 9. School fixed effects control for preexisting factors that could affect selection into a school or community. The fixed effects also control for school- or neighborhood-level environmental factors that may cause overall differences in emotions across schools or neighborhoods.
Both the reflection issue and the common environment issue arise from the fact that students study and interact in the same classroom on a daily basis. Thus, we address these challenges by isolating the variation in classmates’ emotions that is not determined in the classroom. Specifically, we instrument classmates’ emotions using classmates’ early childhood health condition—i.e., serious illness before primary school.
Whether classmates had a serious illness before primary school is very likely to be an exogenous variation; illness in the past cannot be affected by the current classroom environment or current classmates’ characteristics. Furthermore, one’s illness before primary school could potentially affect one’s own emotion. An individual’s emotion is generally correlated with her health condition48,65,73,74,81. A serious disease in childhood could potentially have long-term consequences on her current emotions24,84. Thus, we can expect a certain amount of variation in classmates’ emotion that is driven by their average early childhood health conditions. Our data support this conjecture. As shown in Figure 3, we find a strong positive correlation between the IV (the average incidence of classmates’ illness) and the endogenous variable of interest (classmates’ average emotion). In the figure, we divide our sample into three equal groups based on the proportion of classmates who had a serious illness before primary school: low, medium, and high. The figure suggests that the high group—i.e., classes where more classmates had serious illness—are more likely to report unhappy compared to the low group.
The identification strategy relies on idiosyncratic variation in the underlying health conditions across classes in the same school generated by within-school random class assignment; some classes will simply have a greater share of students who had a serious illness in early childhood than other classes. The randomly assigned composition of classrooms therefore may cause a student to have more or less negative emotion than a student in another classroom.
The identification strategy will be invalid if schools sort students based on their observed characteristics (even though schools reported that classroom assignments were made randomly) or, in a very unlikely scenario, an early childhood illness could be somehow affected by a factor in the current classroom. These identification assumptions are verified as shown in Tables S1 and S2.
The IV identifies a “local” effect: one’s negative emotion that is specifically caused by a classmate’s early childhood health condition. To show that emotional contagion could be caused by various exogenous shocks, we further exploit two alternative IVs that also stem from shocks outside the classroom (classmates’ parental conflict at home and classmates’ parental health condition) and yield consistent results as reported in Tables S7.