Mask-wearing and social distancing: Evidence from a video-observational and natural-experimental study of public space behavior during the COVID-19 pandemic


 Face masks have been widely employed as a personal protective measure during the COVID-19 pandemic. However, concerns remain that masks create a false sense of security that reduces adherence to other public health measures, including social distancing. This paper tested whether mask-wearing was negatively associated with social distancing compliance. In two studies, we combined video-observational records of public mask-wearing in two Dutch cities with a natural-experimental approach to evaluate the effect of an area-based mask mandate. We found no observational evidence of an association between mask-wearing and social distancing, but found a positive link between crowding and social distancing violations. Our natural-experimental analysis showed that an area-based mask mandate did not significantly affect social distancing or crowding levels. Our results alleviate the concern that mask use reduces social distancing compliance or increases crowding levels. On the other hand, crowding reduction may be a viable strategy to mitigate social distancing violations.


Introduction
During the COVID-19 pandemic, most countries recommend or mandate the use of face masks in public places. This measure aligns with a growing consensus that mask-wearing by members of the public is effective in mitigating coronavirus transmission 1,2 . However, there remain concerns that mask use may have unintended adverse behavioral effects, including that mask-wearing creates a false sense of security, which reduces compliance with other key mitigation measures 3,4 . This is attributed to a 'risk compensation' mechanism that leads individuals to behave riskier in situations they perceive as safer 5 .
While such concerns may have played a role in delaying the adoption of personal protective measures during the COVID-19 pandemic 6 , the evidence supporting the risk compensation hypothesis remains controversial and fragile 7 . For example, the wearing or ski or bicycle helmets-often presented as a paradigmatic example of risk compensation-is not robustly linked with more risky practices 8, 9 , and similarly, the evidence does not support that mask use adversely affects hand hygiene 6 . Nevertheless, mask use has been speculated to adversely affect social distancing directives 10 -i.e., avoiding physical contacts and minimizing gatherings 11 . The assumption is that if individuals wearing masks feel protected, they may adhere less to social distancing directives, and likewise, other persons may feel that it is safe to have close encounters with a mask-wearer 12 .
Since the onset of the corona pandemic, several studies have offered con icting evidence for risk compensation with respect to social distancing. In line with the hypothesis, one study found that masks reduced social distancing in an online experiment with responses to scenarios visualizing test subjects and others with and without masks 13 . Similarly, survey data found that although masks were not linked to people's concern with the number of close contacts (and hygiene), mask users reported less concern about keeping distance and crowd avoidance 14 . Further, this agrees with geo-tracked mobility data showing that mask mandates lead to less stay-at-home compliance 4 . However, rejecting the risk compensation hypothesis, eld experiments have consistently found that pedestrians 15 and people queuing for shopping 16,17 keep a greater social distance to masked than unmasked persons. Here, it was also found that masked persons themselves did not keep a shorter distance from others and that mask mandates do not lead to less distancing compliance 16 . Moreover, survey studies have found that mask-wearing was linked with greater concern about avoidance of others in public places 18 , and geo-tracked mobility data further showed that people stayed more at home when masks became mandatory 19 .
On the face of it, the literature offers a somewhat mixed picture of the link between mask use and social distancing. Yet arguably, the sub-body of eld-experimental research rejecting the risk compensation hypothesis should be given the most evidential weight, given its reliance on direct records of real-life mask use and social distancing behavior (Baumeister et al. 2007). To further examine whether this is indeed the case, the current study applies an alternative method to provide high-resolution data on reallife public behavior: video-assisted naturalistic observation 20 . Speci cally, across two studies-one observational and one that also includes a natural-experiment in two major Dutch cities-we used video footage to examine mask use and distancing behavior in public spaces during the COVID-19 pandemic. We tested the individual-level expectation of the risk compensation theory that mask-wearers keep less distance, as well as the parallel expectation at the aggregated level that an area-based mask mandate would make public places more crowded.

Study 1
Methods Data were a sample of video observed individuals recorded by municipality-operated public security cameras in the Netherlands (data and materials are available at osf.io/j7guw). We were granted permission to use the recordings for scienti c purposes by the Dutch Attorney General, Ministry of Public Affairs.
The Ethics Committee for Legal and Criminological Research at the Vrije University Amsterdam approved the study.
We obtained access to more than 60,000 hours of footage across 63 cameras located in Amsterdam and Rotterdam. For Study 1, we selected recordings from a single camera in Amsterdam (to minimize between-context heterogeneity), which had a high quality, and captured a pedestrianized street that allowed for continuous observation of pedestrians. We included 60 hours of footage from ve days (three Thursdays, one Saturday, one Sunday), recorded in the day hours between May and the beginning of June 2020.
Coding procedure. Two trained research assistants coded data following a codebook developed for the study. The interrater reliability of the codebook was evaluated by independently double coding 44 individuals and 25 contexts. All included variables had a Gwet's 21 AC1/AC2 score larger than .8, indicating good interrater agreement (each score is noted in the below Measures section). The coding began by randomly selecting 51 30-minute segments across the 60 hours of footage included. If possible, we then observed seven persons with a mask and-to construct a relatively balanced sample-seven persons without a mask for each segment. In total, we sampled 383 persons (176 with and 207 without a mask) for an average of 25 seconds (SD = 7.4) and a total of 158 observation minutes. This satis ed an a priori power analysis suggesting that 339 cases would detect a small effect (f² = 0.05), with a power of 90%, and a conservative alpha of .005 22 . The small effect size assumed in the power analysis was established from what we considered a lower threshold of practical signi cance 23 . Note that we coded beyond the required number of observations to have a buffer for missing data.
Measures. The dependent variable was captured as a binary variable distinguishing between whether or not the observed individual was within a 1.5 meters radius to a stranger (AC1 = .92), i.e., the o cial Dutch meter-threshold for social distancing. Whether the other person is a stranger or a liated was inferred from whether they arrived at the scene together and walked in each other's company 24 . To assess the coding of interpersonal distance, we utilized the exact dimensions of street tiles as a 'ruler.' Note that we also, as an alternative 'high-risk' version of the dependent variable, measured social distancing with a 0.5 meters cutpoint (AC1 = .89).
The independent variable was a binary measure, distinguishing between whether the person wore a face mask or not (AC1 = 1.0). Face masks included respirators (e.g., N95), surgical masks, cloth masks, and excluded persons wearing face shields, and improvised face coverings (e.g., bandanas, scarves). We also excluded persons wearing masks covering neither the nose nor the mouth (e.g., hanging under the chin) or who changed the mask's placement (i.e., between facial areas, or putting it on/off). Finally, we included some control variables in the observational analysis: a visual assessment of the person's age (AC2 = .90) and gender (AC1 = .96), and a measure of crowding captured as a count of the number of persons moving through each segment (AC2 = 1.0).  These results remained unchanged after controlling for whether the persons were in an area where mask-wearing was voluntary or mandatory. To minimize the risk that Study 1 and Study 2 were underpowered to identify a potential minute effect of masks on distancing when estimated separately, we analyzed a dataset pooled from the two studies 28 . This analysis con rmed the non-signi cant result of masks (β = 0.01, CI 95% [-0.04, 0.07], p = .584).

Results
Next, the natural-experimental data was analyzed with a difference-in-difference regression 29 , estimated as second differences 30 . A manipulation check found that the area-based mask mandate increased the proportion of mask-wearing with more than 30-percentage points (second difference = 0.32, p < .001), suggesting a relatively successful implementation of the treatment. Note that this result corresponds with systematic eld observations we conducted in the treatment areas, which suggested uncertainty about where it was obliged to wear a mask. Further, the mask mandate treatment did not in uence the level of people crowing (second difference = -5.77, p = .126). We highlight that this result to some extent hinged on how the models were speci ed. Speci cally, if speci ed without cluster-robust standard errors-which however is recommended 31 -the model yielded a signi cant (yet somewhat fragile) negative differencein-difference estimate (i.e., direct counterevidence for the risk compensation hypothesis. Finally, we found that the mask mandate did not affect the individual-level likelihood of social distancing encounters (second difference = 0.036, p = .781), and this result remained non-signi cant after controlling for crowding.

Study 2 Methods
We designed study 2 as a replication of study 1 and, as such, applied the same interrater-reliability tested codebook, coding strategy, and measures as Study 1. However, Study 2 also utilized a natural experimental situation 27 , with the municipalities of Amsterdam and Rotterdam implementing mask mandates in densely crowded areas (e.g., tourist and shopping areas). The mask mandate was announced in local media and onsite with signs and by municipal workers and occasionally by the police reprimanding or ning non-compliers.
We collected around 500 hours of recordings from six cameras with a high recording quality. Three cameras were located in intervention areas where a mask mandate was enforced, and three were located in comparable control areas. The raw footage covered 13 days (Wednesdays, Saturdays, one Sunday), with four days (across two weeks) of pre-intervention baseline measures and nine post-intervention days (across four weeks). From this sample, we randomly selected 78 30-minute segments, across which a team of twelve trained research assistants observed 423 persons (167 with and 256 without a mask), with an average of 23 seconds and a total of 164 minutes of observation. Finally, at randomly selected time points across the treatment and control areas, we further took 358 records of crowding and 342 records of the proportion of people wearing a mask.
Note that not all of these six locations had a tile layer, which we utilized in the interrater reliability test of social distancing in Study 1. As such, the records of this measure may be more noisy in Study 2 than in Study 1. Also note that measure of proportion of mask-wearer was introduced after the interrater reliability assessment had been conducted. However, we have indirect evidence that the interrater agreement is good, given that this measure is a combination of mask-wearing and crowding measures, both of which were assessed to have a near-perfect agreement.

Discussion
It would be a cause for concern if face masks reduced the adherence to social distancing directives, as predicted by the risk compensation hypothesis. The current study helps alleviate that concern, with internally replicated observational evidence for the absence of a mask-distancing association, and natural-experimental data showing that a mask mandate did not in uence social distancing and crowding levels.
Although the literature on risk compensation around mask use is sparse and of somewhat mixed quality, the few studies that have applied direct observations of individual behaviors do not support the risk compensations hypothesis 17 . The current observational and experimental studies-examining both voluntary and mandatory mask settings-also rejected this hypothesis. As such, our ndings add valuable information to the literature suggesting that face mask-wearing is not likely to lead to a reduction in social distancing compliance or an increase in crowding levels due to a false sense of security.
As an alternative to a risk-compensation explanation of social distancing, the current results suggest-in line with prior research 33,34 -that social distancing violations are chie y predicted by crowding. That is, when citizens move through crowded public places, it is more challenging to keep the desired 1.5 meters distance to other people. A practical implication of this argument is that COVID-19 interventions towards social distancing violations may nd utility in crowd control 35 and in targeting street infrastructural surroundings that shape public crowding 36 .
A limitation of the current study is that we do not distinguish whether it is the potential mask-wearer or their counterparts who have the primary role in stepping over the 1.5 meters distancing threshold. Further, we acknowledge the quasi-experimental nature of Study 2, with a non-random assignment of the areabased treatment and, thus, an increased risk that our results are affected by unaccounted confounders. Note, however, that the analysis of social distancing did account for a potential key confounder-i.e., crowding-that we identi ed in our observational analyses. We also acknowledge that our experimental approach is constrained by a short follow-up period of four weeks, and that the reported null ndings may be due to an insu ciently powerful treatment, as indicated by the only partially successful manipulation check. Finally, it should be mentioned that the null results of the difference-in-difference analyses may be due to us underestimating how large a sample should be to identify interaction effects (i.e., around four times larger than main effects 32 ). As such, the study was de facto powered to identify 'medium'-sized interaction effects (rather than a 'small'-sized main effects as planned). This, in turn, may have in ated the false-negative error risk of the difference-in-difference analyses.
In conclusion, the current studies provided observational and natural experimental evidence that maskwearing does not reduce social distancing or increase crowding, neither under voluntary nor mandatory conditions. Declarations 3 . Gelman, A. Scaling regression inputs by dividing by two standard deviations. Stat. Med. 27, 2865-2873 (2008). Figure 1 Regression analyses of observational data of social distancing violations in Study 1 and Study 2 Note. Linear probability model estimates, with 95% and 99.5% con dence intervals (two-tailed). All models controlled for the time duration of each observation. The continuous age and crowding items were standardized to make them comparable to binary predictors 38.