Patterns of human social contact and mask wearing in representative high-risk groups in China

Background: The pandemic of COVID-19 has changed human behavior in areas such as contact patterns and mask-wearing frequency. Exploring human-human contact patterns and mask-wearing habits in high-risk groups is an essential step in fully understanding the transmission of respiratory infection-based diseases. Methods: Delivery workers, medical workers, preschoolers, and students. from Qinghai, Shanghai, and Zhejiang were recruited to complete an online questionnaire that queried general information, logged contacts, and assessed the willingness to wear a mask in different settings. The numbers of contacts across different characteristics were assessed and age-specic contact matrices were established. A generalized additive mixed model was used to analyze the associations between the number of individual contacts and several characteristics. The factors inuencing the frequency of mask wearing were evaluated with a logistic regression model. Results: A total of 611,287 contacts were reported by 15,635 participants. The frequency of daily individual contacts averaged 3.14 (95%CI: 3.13-3.15) people per day, while that of group contacts was 37.90 (95%CI: 37.20-38.70). Skin-to-skin contact and long-duration contact were more likely to occur at home or among family members. The contact matrices of students were the most assortative (all contacts q-index = 0.899, 95%CI: 0.894-0.904). Participants with larger household sizes reported having more contacts. A higher household income per capita was signicantly associated with a greater number of contacts among preschoolers and students. In each of the public places, the frequency of mask wearing was highest in delivery workers. For preschoolers and students with more contacts, the proportion of those who reported always wearing masks was lower (P<0.05) in schools/workplaces and public transportation than preschoolers and students with less contacts. Conclusions: The rate of mask wearing must be improved among preschoolers and students, considering their susceptibility and lower mask-wearing rates. Contact screening efforts should be concentrated in the home, school, and workplace after an outbreak of an epidemic, as more than 75% of all contacts, on average, will be found in such places. Age-stratied and occupation-specic social contact research in high-risk


Background
Social mixing patterns differ according to age [1] and are strongly assortative in terms of age-based contact rates [2]. Factors such as weekdays versus weekend days, daily out-of-subdistrict travel, animal rearing, participant age, and household size have been associated with more social contacts [3,4]. These self-reported contacts are relevant to the transmission patterns of acute respiratory infections, such as mumps, in uenza, chickenpox, and parvovirus [5][6][7]. Respiratory-borne diseases and relevant emerging pathogens with established human-human transmission (e.g.,  can spread through the exchange of respiratory droplets between people engaging in person-to-person contact [8] or in close physical proximity to one another [9]. Thus, the human spread or even outbreak of these diseases is likely to be driven by patterns of human encounters [4]. It is important to quantify these interactions, especially in light of how different age groups mix; these data can be used to increase the effectiveness of disease-targeting interventions, such as vaccination, contact tracing, and social distancing [10], and generate mathematical models that can predict the course of an epidemic and the effectiveness of interventions [11]. For example, the famous POLYMOD study [4] investigated the social contact patterns in eight European countries, combined the ndings with serological data, and found that intimate contact can explain the transmission of varicella and parvovirus B19 infection [6]. Dodd et al. used social contact pattern data to enumerate "close" (shared conversation) and "casual" (shared indoor space) social contacts in 16 Zambian communities and eight South African communities to model the incidence Mycobacterium tuberculosis infection among adults [12]. Data on human interaction patterns can therefore help researchers clarify risk factors for transmission and design interventions for controlling epidemics.
That said, most of the relevant research was conducted before the outbreak of COVID-19 and the ensuing implementation of control measures. These measures have reshaped the behavior patterns of Chinese society [3], especially in terms of mask wearing and social distancing. In addition, few (if any) of the existing surveys focused on high-risk people, such as delivery workers, medical workers (who interact in large groups), or school-aged children (who generally have low levels of prior immunity [13]). Finally, the rapid economic development, high urbanization, and frequent human interactions of China [14] mean that this country plays an important role in global pandemics of respiratory-transmitted diseases, such as in uenza [15]. Thus, efforts to identify new authentic parameters for contact patterns in high-risk groups after the COVID-19 outbreak in China will be critical for improving the accuracy of mathematical models in predicting the spread of infections and assessing preventive measures [1].
Here, we selected districts/counties of Shanghai, Zhejiang Province, and Qinghai Province as our study sites.
A diary-based survey was employed to survey social contacts at these sites between December 2020 and March 2021. This study had three aims: (i) to quantify local human-human (H-H) contacts in high-risk groups in representative provinces of China, (ii) to explore the occupation-speci c assortativity and heterogeneity of social contacts, and (iii) to assess the behavioral pattern of mask wearing under the existing COVID-19 prevention and control measures.

Study sites and sampling
Our survey was carried out between December 2020 and March 2021 in three provinces of China (Shanghai, Zhejiang, and Qinghai), which were chosen for being relatively different levels in population density and socioeconomic factors. The Minhang and Songjiang Districts in Shanghai City, Huzhou City in Zhejiang Province, and Haidong City and Haixizhou City in Qinghai Province were purposively selected as survey sites; from them, su cient districts/counties were sampled following multi-stage strati ed sampling. Lists of hospitals, schools (kindergartens, primary, junior, and senior schools) and delivery companies were obtained from each district/county. The list elements were randomly reordered, and recruitment of eligible individuals was attempted in sequence from this list with the help of local district/county workers. The high-risk populations targeted in the study were delivery workers, medical workers, preschoolers, and students. The high-risk populations targeted in the study were delivery workers, medical workers, preschoolers, and students. Preschoolers, mostly aged 0-5 years, are children those who have not entered elementary school and students includes elementary, junior, and senior high school students, who are generally aged 6-19. We did not set any maximum prede ned target size for each speci c institution. According to our calculations, the sample size of this cross-sectional study should be at least 9811 participants, of whom 20% might fail to complete the survey. Thus, we needed to recruit at least 818 participants from each group per site. An online questionnaire was used to collect the relevant information, and all obtained questionnaires were subject to quality audit by the investigator.
For delivery workers, we sampled several delivery companies per site and established WeChat Groups for each of them. WeChat Groups, as the widely used daily social communication platforms for people in China, are composed of many individual users. We use groups to chat with different circles of friends, send event and promotional news to potential customers, or share and discuss information related to our business or interests. Because of their rich information transmission modes, we tried to recruit all delivery workers who belong to companies we selected to enter our WeChat Groups with the aid of Head of delivery company. A trained investigator joined each WeChat Group, introduced the purpose of the study, and provided information on lling out the questionnaire. The delivery workers who consented to participate in the survey were asked to ll in a web-based questionnaire. Medical workers from one, two, or three hospitals of each site were selected as participants and asked to complete the survey. Preschoolers were recruited from one or two kindergartens of each site. At each of the three sites, primary, junior, and senior schools (one each) were selected, and students from one class of each grade in the school were included. Parents of preschoolers and primary school students were asked to recall their child's contacts and complete the questionnaire. Junior and senior school students were asked to self-complete the questionnaire after their parents' informed consent was obtained. For all participants included in the study, an informed consent form was obtained from the participant or their parents (preschoolers and primary school students). All included participants had lived in the district/county for at least 1 month prior to being enrolled in the study.

Survey contents and methods
The questionnaire consisted of three sections: general information, contact frequency, and willingness to wear a mask in different settings (see Supplementary Text). The queried general information comprised the respondent's demographics, including their age, sex, income, duration of local living, and household size. Referring to POLYMOD [4], contact was de ned as: (i) a two-way conversation with three or more words in the physical presence of another person (conversational touch), or (ii) skin-to-skin contact (such as a handshake, hug, kiss, or contact sport). In the contact diary, for "individual contacts" (those occurring with up to 19 persons), participants were asked to list each person with whom they had contact during a day and give some details about each contacted individual, including their age (or age range, which was replaced by the mid-point of the range for the purposes of analysis), sex, relationship to respondent (relative, colleague/classmate, friend, teacher, or other), contact type (physical contact or not), setting of the encounter (home, school, o ce, transportation, or other), duration (less than 5 min, 5 to 15 mins, 15 to 60 mins, 1 to 4 hours, more than 4 hours) and frequency (almost every day, once or twice per week, once or twice per month, less than once per month, rst meeting). If a participant had contact with the same person several times in a given day, it was recorded as one contact and the total duration of all interactions was used. As this item-byitem approach is inappropriate in recording numerous contacts, we de ned a "group contact" as that occurring with 20 or more people per day (i.e., due to occupation or participation in some activity), which is the total number of individuals estimated to be contacted per day in groups. For group contacts, participants reported the number of age-speci c group contacts per day. Regarding mask wearing, for each of the listed places (school/workplace, public transportation, training institution, outdoor public space, non-enclosed indoor public space, enclosed indoor public space, medical place) visited in the prior month, participants were asked to report their frequency of mask wearing (never, rarely, occasionally, often, every time) at the site.
This survey was designed to be fully anonymous and the names of participants were not recorded at any point. The study received ethical approval from the School of Public Health, Fudan University.

Statistical analysis
The distributions of contact numbers were assessed for each region, occupation, sex, age group, education level, household income per capita, and household size. For individual contacts, the proportions of contact duration, setting, relationship, and frequency were plotted. Proportions of different contact settings and relationships were also strati ed by contact duration and frequency.
We established different age classes among the different occupations and regions to build our age-speci c H-H contact matrices, with the goal of estimating the age-speci c individual/overall contact number per participant per day. Participants did not report the exact age of each member of a group contact, so we modeled the age distributions for individual contacts using the Gaussian kernel function, strati ed by the age groups of participants in each occupation (Supplementary Figure S1). Then we drew the age of group contacts randomly from the model, with reference to the age group of the participants and the group contacts. We repeated the sampling process 200 times to estimate uncertainty. We used q-indices, which represented departures from proportionate mixing and ranged from zero (proportionate) to one (fully assortative), and bootstrapped 95% con dence intervals to assess the degree of age assortativity[16].
A Generalized Additive Mixed Model (GAMM) with a negative binomial distribution was used to analyze the association between the number of total contacts (individual and group contact) and the selected variables (sex, household size, household income per capita, weekdays or weekend days, and region) in the different occupation groups. We tted thin plate regression splines to explore potential nonlinear relationships between continuous participant age and contact number.
Chi-squared tests were used to compare the distribution of mask wearing between regions and settings among the four occupations. We divided mask wearing into two levels (wearing a mask every time or not at all) and used univariate logistic regression to analyze factors that might affect mask wearing. Independent variables (occupation, contact group) were included in our multinomial logistic regression model.
Data analyses were performed using the R.4.1.1 software with the mgcv and socialmixr packages. All gures were plotted using the R package, ggplot2. Differences were considered statistically signi cant at P < 0.05.

Demographic characteristics of participants
We collected data from 15,635 participants (Table 1); 51.6% were male and 70.4% were under 20 years of age. Of the participants, 9.5%, 12.2%, 29.5%, and 48.8% were delivery workers, medical workers, preschoolers, and students, respectively. The household income per capita was above 50,000 RMB for 35.5% of the participants. Most (47.6%) of the participants were members of households having three or four members. Characteristics of participants strati ed by occupation and province are presented in Supplementary Table   S1. Participants whose households had one or two members reported more group contacts. Participants from Shanghai reported the fewest group contacts among the sites ( Table 1). The distribution of the number of group contacts was right-skewed among the different occupations. Among the occupation groups, more than 40% of participants reported 20-29 group contacts; moreover, 22.94% of delivery workers and 14.13% of medical workers reported more than 100 group contacts, compared with 2.75% of preschoolers and 5.93% of primary and secondary school students ( Figure 1A2). The average daily number of contacts per participant differed by (when reported) sex, age, education level, household income per capita, and household size among the four occupations in the three provinces (Supplementary Table S1). Among delivery workers, males reported more contacts than females (P Qinghai =0.046, P Shanghai <0.001, P Zhejiang <0.001). Among medical workers, preschoolers, and students, the number of contacts did not signi cantly differ between males and females. Younger delivery workers aged 18-34 had contact with more people than their older counterparts, and medical workers aged 18-24 and 35-44 had contact with more people than their older counterparts. Among preschoolers and students, the older the respondent, the more contacts were reported. In the four occupation groups, the trends for contact number and household size differed across the three sites (Supplementary Table S1).
For all participants, the proportion of direct contact increased with the contact duration (Figure 2A), and the proportion of skin-to-skin contact was higher when we considered contact occurring at home or between family members ( Figure 2B,C). Fewer skin-to-skin contacts occurred for rst-time contacts ( Figure 2D). The proportions of physical contacts strati ed by occupation or province showed the abovementioned patterns (Supplementary Figure S2-S5). As the contact duration increased, so did the proportion of contacts happening at home and among family members ( Figure 3A1,A2). For most (88.70%) of the participants with contact durations > 4 hours, the involved contacts were daily contacts ( Figure 3A3). Of the daily contacts, 90.32% happened at home or in the workplace ( Figure 3B1). As the contact frequency decreased, the proportion of contacts among family members or colleagues declined from 71.62-13.36% ( Figure 3B2). Contact duration also declined as the contact frequency decreased ( Figure 3B3).

Human-human contact matrix and assortativity of contacts
The overall q-indexes for individual contacts and all contacts were 0.489 (95%CI: 0.487-0.492) and 0.550 (95%CI: 0.546-0.554), respectively. There was a diagonal element in the overall individual contact matrix ( Figure 4A1), indicating that participants in different age groups trended to mix assortatively by age. The diagonal element was most pronounced in those aged 5 to 20 years, and least pronounced in those aged 45 years and above. For the general contact matrix ( Figure  years. Preschoolers and students tended to have contact with the same-age individuals and those aged 24-40 years. For medical workers, the individual contacts ( Figure 4A3) revealed a diagonal starting around 15-50 years old for both contacts and participants. The above-described patterns were also observed in the contact matrices strati ed by province and occupation (Supplementary Figure S6,S7). Among the three provinces, Qinghai showed more assortativity (individual contacts q-index = 0.543, 95%CI: 0.543-0.548; all contacts qindex = 0.641, 95%CI: 0.632-0.649) than the other two provinces.

Factors associated with contact frequency
In the GAMM regression model (Table 2 and Figure 5), all four occupation groups tended to show an increase in the contact number as the household size increased, although signi cant contributions were observed for only students and preschoolers. The number of contacts for female delivery workers was signi cantly lower than that for male delivery workers (OR: 0.50, 95%CI: 0.43-0.58). Household income per capita had a signi cant contribution to the number of contacts for preschoolers and students (P<0.05). Students whose household incomes per capita per year exceeded 100,000 RMB had higher numbers of contacts than those whose household incomes were less than 10,000 RMB  Figure 5).

Proportion of access to public places and mask wearing
All participants were asked to report on whether they went to seven listed places (school/workplace, public transportation, training institution, outdoor public space, non-enclosed indoor public space, enclosed indoor public space, and medical place) and how often they wore a mask in the visited places. Of the participants, 93.0% had gone to a school/workplace, while only 64.4% had gone to public transportation. The place most commonly visited by delivery workers and medical workers was a non-enclosed indoor public space, while the place most commonly visited by preschoolers and students was a school/workplace (Supplementary Table  S2). Shanghai had the highest proportions of those who reported always wearing a mask for each of the seven places, with especially high proportions seen for training institutions and medical places. Females had higher proportions of always wearing masks in public places. Delivery workers had the highest proportion of always wearing masks, followed in decreasing order by medical workers, preschoolers, and students. On weekdays compared to weekends, participants reported higher proportions of always wearing masks in training institutions and medical places, but lower proportions in other places. The higher contact group had a lower proportion of always wearing masks (Supplementary Figure S8). Our univariate logistic regression suggested that most of the above differences were signi cant (P<0.05, Supplementary Figure S9). After we adjusted for participants' sex, province, and workday/weekday, the association between contact level and mask wearing differed across the four occupations ( Figure 6, Figure 7). For delivery workers, it seemed that participants with a contact number in the middle of the range had a lower proportion of always wearing masks (P>0.05). For medical workers, in workplaces and training institutions, participants with more contacts had higher proportions of always wearing masks (P>0.05). For preschoolers and students, the proportion of respondents who reported always wearing masks in schools/workplaces and public transportation was lower for those with more contacts (P<0.05), whereas the proportion of respondents who reported always wearing masks in training institutions and medical places was higher for those with more contacts.

Discussion
In this large-scale cross-sectional survey of self-reported contact patterns in representative high-risk groups from three provinces of China, we found that the average number of individual contacts per person per day is 3.14 (95%CI: 3.13-3.15). This is signi cantly lower than pre-pandemic numbers reported from Europe (13.4) [4], Taiwan (12.5) [17], and Guangdong Province, China (16.7) [1], but similar to those reported in Canada (2.21 to 3.89) [18] and Europe (2 to 5) [19] after the outbreak of COVID-19. Our nding con rms that daily individual contacts were reduced several times during the existing COVID-19 prevention and control measures than prepandemic (14.6 to 18.8) [3], but higher than during the COVID-19 strict social-distancing period (2.0 to 2.3). Compared to slight uctuation in daily individual contacts from period of closure of schools and public places to the existing COVID-19 prevention and control measures, the numbers of group contacts had increased [3] and were signi cantly larger than the numbers of individual contacts across the four occupation groups, meaning that increases in contacts post-relaxation might be driven by working or studying mainly [19]. during the COVID-19 social-distancing period [3], when most interactions were restricted to the household level in China Our contact matrix revealed that diagonal element strengths were obvious in all occupation groups and regions, and were particularly strong in the individual contacts. This indicates that all age groups were highly assortative, with interactions occurring much frequently with others of a similar age group. This characteristic is known to shape the transmission of infectious disease [20], suggesting that it is crucial to account for agespeci c susceptibility to infection. For example, most individuals contacted by children and teenagers are of a very similar age [21][22][23], which is likely to be the main reason why children and teenagers represent an important conduit for the initial spread of close-contact infections in general and in uenza in particular [5,24]. Similar to the ndings of Kiesha Prem's research [25], we found that high assortativity of contacts is common in schools (here, students) and less apparent in working-age individuals in the workplace (here, medical workers and delivery workers). Medical workers and delivery workers need to contact patients and customers of all ages, and thus exhibited greater heterogeneity in the ages of their contacts. A more diverse age contact structure may provide a route for transmission to spread between medical workers (or delivery workers) and the rest of the population, resulting in a greater number of new infections [26].
Although the total contact number determines the potential frequency of exposure to infections, the risk of infection may depend more strongly on contact duration and physical contact [27,28]. We considered a number of different measures for "closeness of contact," including the duration and frequency of contact and whether skin-to-skin contact occurred. As previously reported [29,30], these measures correlated highly with one another, such that longer-duration contacts tended to be frequent and involve physical contact (and vice versa). Importantly, more intimate contacts are likely to carry a greater risk of transmission [31]. Furthermore, these types of contact tend to occur in distinct social settings: Skin-to-skin contacts typically occurred at home [32] or in the workplace, whereas non-physical contacts tended to occur in the transport sector. This variation has important implications for contact tracing during outbreaks of a new infection. Our results suggest that if efforts are concentrated on locating contacts in the home, school, and workplace, on average more than 75% of all contacts would be found.
Our research also found that the number of contacts increased with the household size, which is consistent with previous reports [3,33]. The reported contact numbers of students and preschoolers increased with the family income level, suggesting that students from higher-income families may have more opportunities to participate in various trainings and/or spend more time on social and leisure activities [33]. One study [34] showed that weekdays were associated with 23-28% more contacts than weekend days among students, perhaps indicating that students have fewer contacts with their classmates during weekends. Our results showed that delivery workers (OR = 1.33, 95%CI: 1.01-1.75) and medical workers (OR = 1.18, 95%CI: 1.04-1.34) in Shanghai were more likely to have contact with others than their counterparts in Qinghai. This might re ect that contact rates and patterns among individuals is associated with population density [35]. Medical workers aged 25-35 and >60 usually had lower numbers of contacts than aged 39-49, which might re ect that medical interns and doctors approaching retirement had fewer opportunities to contact patients.
Combined with social distancing, wearing a face mask can be effectively atten the epidemic curve [36,37]. Mask wearing was found to be signi cantly more prevalent among delivery workers and medical workers compared to students and preschoolers, suggesting that long-lasting COVID-19 behavior norms are likely to persist well in the former two high-risk populations. For preschoolers and students with more contacts, the proportion of those who reported always wearing masks was lower (P<0.05) in schools/workplaces and public transportation than preschoolers and students with less contacts, suggesting that parents should make efforts to improve mask-wearing behavior in these groups [38]. Most notably, the percentages of mask wearing in the different contact groups did not differ much, regardless of whether or not the data were strati ed by the occupation groups. Our ndings suggest that participants with a lower contact number also maintained their mask-wearing behavior, suggesting that masks may have become a behavioral habit for dwellers [39]. Previous research found that sex, age, location, health consciousness, and knowledge of disease all factor into whether members of the public wear a mask [40,41]. This may explain the tendencies observed herein for females to have higher proportions of always wearing masks sex-related difference, and for participants in Shanghai to be more likely to keep wearing masks in public spaces location-related difference.
This study is prone to the limitations pertaining to social contact surveys. The contact survey and maskwearing results presented in this study are based on self-reported contacts, and may thus be affected by various biases, including recall bias and self-reporting bias. In the future, this could be avoided by using a prospective design with advanced noti cation or instruction in face-to-face interviews. As this study focused on high-risk populations, caution should be used if seeking to apply our results to the general population.

Conclusions
Higher heterogeneity in the age of contacts for delivery workers and medical workers might contribute to the transmission of respiratory infection-based diseases. Efforts should be made to improve the mask-wearing rate among preschoolers and students, considering their susceptibility and lower mask-wearing rate. Contact screening work should be concentrated in the home, school, and workplace after an outbreak of an epidemic, as more than 75% of all contacts, on average, will be found in these settings. Age-strati ed and occupationspeci c social contact research in high-risk groups can help inform policy-making decisions during the postrelaxation period of the COVID-19 pandemic.     Contact matrices by age. Each cell of the matrix represents the mean number of contacts that an individual in a given age group has with other individuals, strati ed by age group. The color intensity represents the number of individual contacts (A1-A5) or all contacts (B1-B5). To construct the matrix, we performed bootstrap sampling with replacement of survey participants weighted by the age distributions of the actual populations of Shanghai, Zhejiang, and Qinghai. Each cell of the matrix represents an average over 100 bootstrapped realizations.

Figure 5
Estimated numbers of contacts in regression models for the different occupation groups, with 95% con dence intervals denoted by shaded regions.

Figure 6
The proportion of participants who reported always wearing masks in different places, for different contact groups, strati ed by occupation group.

Figure 7
Associations of always wearing a mask with different contact levels. OR (dots) and 95%CI (error bars) were calculated from multivariate logistic regression after we adjusted for sex, province, and workday/weekday.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.