A simple method to handle non-response bias in school surveys on drug 2 use 3

Background: Studies have proved the positive link between truancy and substances use in school 25 surveys. In spite of this, no adapted weighting treatment is generally provided; even when the share of 26 missing and truant pupils is high, and all drug use estimates are biased downward. The necessary data 27 can be collected: on one side, individual current drug use and past episodes of absence and truancy of 28 the respondents; on the other, the count of the presents and absents the day of the survey, including 29 truants, in each class. However, the nature of these data prevents any classical modelling of the survey 30 response without additional assumptions. Methods: We review one method proposed in 2002 by 31 Guttmacher and al. that uses only the individual data and propose two methods that combine both kind 32 of data and in which we can distinguish or not between truancy and legitimate absence. We apply them 33 to the French release of the 2015 Espad survey (European survey project on alcohol and other drugs). 34 The theoretical number of pupils was n=7166; 981 were absent (including 359 truants), while 178 were 35 discarded because of the poor quality of their questionnaires and 6007 were considered final 36 respondents. Assumptions, point estimates and variances are compared. Results: Guttmacher’ method 37 is not conceptually valid and can lead to irrelevant corrections with high variances. Our estimate of 38 cannabis regular use is 8.6% (std=0.75) instead of 7.7% (std=0.67), that is a non-response bias of circa 39 14%. C onclusion: The proposed approach relies on simple and plausible assumptions; it is preferable 40 to any speculative consideration about the magnitude of the underestimation yielded by the classical 41 weighting procedures. Survey designers should evaluate and discuss the potential bias of their surveys 42 and eventually correct it.

and ̅ . As R is unknown (and has to be estimated with assumptions), the overwhelmingly observed 56 indicator is thus the response rate ̅ , as in (Eq. 1): the higher ̅ , the lower the bias. 57 In general population survey, response rate is considered poorly correlated to the non-response bias (2,58 3). But non-response bias can arise when the topic of the survey by itself relates to the response to the 59 survey, when the missing units contribute to a disproportionally high share of the global estimate; this 60 is what reflects ( , ) in formula (2). Classical examples include surveys on drug use (in which drug 61 users may be hard to reach). In such situations, classical weighting procedures like post-stratifications 62 and calibrations are unable to correct for non-response bias, that is called non-ignorable (4). 63 School surveys on drug use: the importance of school skipping 64 School surveys are often considered as immunised to non-response bias because their response rates ̅ 65 are usually very high. It is true that almost all present pupils respond to the survey questionnaire, but it 66 is nevertheless not uncommon that roughly 10%-15% of the pupils are missing (5, 6) (page 18). Do the 67 drug use levels of these missing pupils differ so much from those of the respondents that specific 68 statistical treatments are needed to provide unbiased drug use estimates? The response is probably yes 69 for two reasons. First, among respondents, there are strong positive correlations between the number of 70 truancy days (absence without justification) during the past 30 days and the number of times pupils have 71 used alcohol, tobacco and cannabis. This is acknowledged in each of the 35 participating countries (7) 72 (page 198) in the 2003 report of the ESPAD survey (European survey project on alcohol and other 73 drugs). Similarly, positive correlations between the legitimate (with a parental justification) and 74 illegitimate absence (without parental justification) and drug use have also been documented in US 75 school surveys for the period 2000-2008 (8). In fact, the link between school skipping, truancy or 76 disengagement from school and drug use has been documented in western countries (9)(10)(11)(12)(13), as well as 77 in other cultural contexts (14). 78 This situation is thus paradoxical, considering that all the relevant variables seem available to reweigh 113 the respondents. The number of absences of the missing pupils is the only information that lacks. 114

Attempts to correct for the non-response bias 115
At first view, we do not need the class-level information: the probability of a unit to be absent depends 116 only on its characteristics. The reported history of absences of the respondents can be used to 117 approximate their probability to respond to the survey. If a respondent pupil reports 5 days of absence 118 within the last 30 days, one can estimate that he/she was present only 15 days in the average 20 school 119 days during this period. As a consequence, its weight has to be multiplied by a factor 20/15=1.33. This 120 is the procedure followed by (27) and by Guttmacher et al. (2002). This approach relies on a reasonable 121 implicit assumption: the day of the data collection can be considered at random and independent of the 122 pattern of absences of the pupils (see Limitations). The idea dates back to 1949 (28) where it was 123 proposed as a mean to avoid multiple visits in face-to-face surveys. 124 In the case of a school survey, this approach has three problems, however. First, it uses the records of 125 past absences of the respondents without considering the numbers of respondent pupils the day of the 126 data collection, that is the amount of lost information due to missing pupils: without any link between 127 the two, this is not a genuine modelling. Second, because of this, it can lead to irrelevant increases of 128 the weighted number of respondents. Imagine for example a class of 30 pupils where none is absent and 129 all respond to the survey but where 10 pupils report each 10 days of past absence: the weighted number 130 of pupils in the class will be T=20 + 10×20/10=40. The only option (that is not used by Guttmacher et 131 al.) is to reweigh the pupils, that will lead the 20 presents without any past absence to get a weigh below 132 1 without any justification. 133

Objectives 136
We propose two simple weightings used as a total non-response correction. Our approach accounts for 137 the probability of being present in class a given day, and for the numbers of present and absent pupils 138 the day of the survey data collection: it assumes a direct link between the two kinds of information. The

Sampling 153
The sampling frame is the national list of secondary schools, classes and pupils (including their gender 154 and age) from the Ministry of Education drawn in January. The sampling design consists on a stratified 155 sampling of classes; strata are combinations of academic field (8 categories), educational sector (private 156 or public) and city size. In each selected school of each stratum, two classes were selected, with unequal 157 probabilities. 158

Survey protocol 159
An advance letter informs the parents that a survey on health and lifestyle will take place within a few 160 weeks: the precise topic and the day of the data collection are not given. The teachers have to keep the 161 secret and not to communicate the topic and the day to the pupils. The data collection consists in a pen 162 and paper survey during one teaching class (50 mn). All questionnaires, filled in or not, are placed in an 163 envelope at the end of the school class that is sealed and directly sent to the data capture centre. The 164 data collection is supervised by a professional supervisor whose role was to introducing the survey, 165 ensuring anonymity and confidentiality as well as showing some example of the future use of the data 166 by the researchers. He/she also fills in a classroom report. Questionnaires with high item non-response 167 rate (>50%) or of poor quality are discarded.

Absence and truancy 169
On the first page of the ESPAD questionnaire, respondents are asked to report the number of days they 170 missed school in the last 30 days according to three motives: illness, truancy (no motive) and other 171 reasons. The response scale is 0 day, 1, 2, 3-4, 5-6, 7+ days from which we derived the total number of 172 missed days by motive: 0, 1, 2, 3.5, 5.5 and 7.5. The three motives were aggregated in two categories: 173 illegitimate absences (i.e. truancy) and legitimate absences (illness and other reasons). 174 The classroom report contains the number of present and absent pupils by gender. Three categories of 175 absences are distinguished: legitimate (there is a parental proof), uncertain (there is a claim by the 176 classmates that the absence is legitimate), illegitimate/truancy (no parental proof and no claim). We 177 distinguished simply the illegitimate absences (truancy) from the others. 178

Definition of respondents and non-respondents 179
We consider as respondents the present pupils whose questionnaire was retained as valid: missing pupils 180 and respondents with discarded questionnaires were considered as non-respondents, the latter as missing 181 with a legitimate motive. 182 Outcomes 183 The variables of interest are dichotomous indicators: alcohol and cannabis regular use (at least 10 uses 184 in the last 30 days) as well as tobacco daily smoking (at least one cigarette a day in the last 30 days). 185 These binary indicators are key variables in the French monitoring of drug use in youth. 186

Missing values 187
Missing values in the report of the past absences/truancy of the respondents were imputed with a random 188 hotdeck procedure considering class, gender, being discarded or not. 189

Total non-response corrections 190
We implement three TNRC methods that aim to correct the sampling weight in order that the weighted 191 respondent sample have the same size than the theoretical number of pupils.
where T(absent boys/girls)L, is the number of boys/girls that are non-respondents the day of the data 208 collection, N(missed school days)i has been defined above and T(past missed school days)L is the total 209 number of reported missed school days among respondents in the stratum L. If there is no absent the 210 day of the data collection, then C0=0 and p2Li=Wi. If no respondent reports a past absence, then p2Li=1 211 (this did not happen in our case). 212 3/ In method 3, we extend method 2 by distinguishing the type of absences reported by the respondents 213 and recorded by the supervisors during the survey: 214

Underlying assumptions 233
In all methods, we assume that there is no self-selection relating to the survey: the day of the survey is 234 at random and no pupil chose to be absent because of the survey (H1). Second, like in every weighting 235 technique, we assume that the unobserved drug uses of the absents or truants can be estimated by the 236 observed drug uses of the respondents (H2), conditionally to specific reports of past absences or truancy 237 episodes (hypothesis of conditional exchangeability). In the Guttmacher' original technique, 238 exchangeability is assumed to hold without condition, whatever the specificities of the pupils: sex, type 239 of school, educational sector etc. On the opposite, using the number of missing pupils at the sampling 240 stratum is more interesting because strata encompass the educational specificities shared by a lot of 241 classes, that relate to the patterns of absence from school and of drug use (see Table 3). We also assume 242 that a missing school day reported by a respondent would be have been (counterfactually) recorded as 243 an absence the by the survey supervisor (H3). H3 makes the modelling of the total non-response 244

possible. 245
In method 3, we add two additional assumptions. H4: the illegitimate (respectively legitimate) absences, 246 either reported by the respondents or recorded by the survey supervisor belong to the same category. 247 That is, we assume that any reported truancy episode has been recorded as such by a professor during a 248 regular class and would counterfactually be recorded as such by the survey supervisor. H4 is a natural 249 extension of H3. H5: we precise the conditional exchangeability assumption (H2). We assume that the 250 respondents who report truancy episodes can represent the missing pupils without parental justification 251 the day of data collection (i.e. the current truants) regarding drug use in a better way than those who 252 report legitimate absences only. 253

Statistics 254
All statistics (weighted or not) were computed using the sampling design (strata and class as cluster) to 255 get unbiased estimates of the standard deviation using the PROC SURVEYFREQ in SAS V9.4. 256

Description of the sample 259
The selected sample of the 2015 French ESPAD survey comprised 284 classes and 7,166 pupils among 260 which only 6,185 (86.3%) were present and 981 (13.7%) were absent during data collection (including 261 16 parental refusals). Among the 981 absent pupils, 359 (36.6%) were truant and 116 (11.7%) had an 262 uncertain status, the other having a parental justification. Only 6,185 questionnaires were filled-in 263 (85.9%), while 6,007 were retained in the final respondent sample (83.8%) because 178 questionnaires 264 had to be discarded due to major incompletion or poor data quality. As a consequence, 1159 pupils 265 (almost 16% of the initial sample) were considered missing (among which 31% were current truants). 266 We considered the 116 absent pupils with an uncertain status as absent with a legitimate motive. 267 The partial non-response rates in the report of past absences were low in the retained questionnaires 268 (5.4% for illness, 8.4% for truancy and 7.1% for other reasons) and higher in the discarded 269 questionnaires (16%, 17% and 18%, respectively). The mean number of reported missed school days 270 was also higher in the discarded questionnaires than in the retained questionnaires (6.2 vs 2.6 days), as 271 well as the proportion of missed days because of truancy (37% vs 25%). Among the retained 272 questionnaires, boys reported less past missed school days without truancy than girls (average 2.5 vs 273 2.8, p<0.001) but the same average of truancy days: 0.7 vs 0.6, p=0.5. 274

Reported past absences and drug use 275
The Pearson correlation coefficients between the outcomes and the reported absences and truancy of the 276 respondents are shown in Table 1. The coefficients with legitimate absences were lower than with those 277 with truancy. Correlations were weak for regular alcohol use (rho circa 0.03) but stronger for tobacco 278 daily smoking and regular cannabis use (rho close to 0.2). The coefficients for smoking and cannabis 279 with the truancy were somewhat higher among girls than boys. As a consequence, drug use levels were 280 higher in respondents who reported a past absence in the last 30 days than among the others and 281 especially high among those who reported episodes of truancy (Table 2). For example, in girls, tobacco 282 daily smoking prevalence was 15.7% among respondents with no reported absence but 26.9% among 283 the others (19.1% among those with legitimate absences only and 41.0% among those with reported 284 episodes of truancy). Respondents with only legitimate absences and those with no absence at all had 285 very similar drug use levels ( Table 2). 286 Truancy is thus clearly the key parameter for any TNRC procedure. However, the numbers of days of 287 absences (legitimate or not) and of truancy episodes were strongly correlated (r=0.68 in boys and r=0.65 288 in girls), justifying trying a TNRC procedure that does not distinguish them. 289 290

How much bias can there be? 291
According to equation (Eq. 1), the maximum (but unrealistic) bias would be observed if all non-292 respondents were drug users. With 16.2% of non-respondents and a (unweighted) prevalence of regular 293 cannabis use among respondents of 7.6%, the true value would be 22.5% (Table 2). Using individual 294 reports of absence allows computing a more plausible estimate. If the unobserved proportion of regular 295 cannabis users among missing pupils was equal to the proportion observed among respondents who 296 reported a past episode of truancy (15.6%), the true proportion of regular users would be 8.9%, 1.3 point 297 (or 17%) above the unweighted value, which may be considered worth correcting the data. However, 298 the true bias remains unknown. 299

Strata and drug use 300
The correlations between the reported number of past missing school days (any absence) and past 301 skipped days (truancy) and the outcomes varies greatly by stratum (Table 3), as well as the levels of 302 drug use: this is an evidence of the relevance of computing the TNRC of methods 2 and 3 at this level 303 instead as at the global level as in method 1. 304

Effects of the TNRC weighting procedures 305
The three TNRC methods perform differently in reconstituting the theoretical number of pupils: the 306 Guttmacher' method led to a large overestimation (n=7998.3 vs 7166), even with the truncation of the 307 weights (n=7482.3) whereas methods 2 and 3 yielded the exact total (Table 4: Sum without sampling 308 weight). Results were similar when the sampling weight was taken into account (Table 4; Sum with 309 sampling weight). 310 The TNRC increased the variance of the weights (measured by the coefficient of variation CV). For 311 method 1 (Guttmacher), the truncation (CV=82.4) yields a much lower variance than the original method 312 (CV=118.9). For methods 2 and 3, the variance was lower because each individual correction contributes 313 only to a share of the missing pupils in the stratum. As expected also, the differences between method 2 314 and 3 were very small because of the high correlation between the numbers of past absences and of past 315 truancy episodes (r=0.68 in boys and r=0.65 in girls). The final calibration reduced the differences 316 between the methods: the CV varies between 81 (for methods 2 and 3) and 89.7 or 118.9 (for the method 317 1 with or without truncation). Note that the sampling design contributes a lot to the variance of the 318 weights as the CV for the sampling weight is already 68.7 before calibration and 76.1 after. 319 Table 5 shows the estimates of outcomes, with the different weighting schemes before calibration. As 320 expected, given the correlations observed in Table 1, levels of tobacco and cannabis uses were more 321 corrected upward than the level of alcohol regular use. As expected again, method 3 yielded only slightly 322 greater estimates than method 2. Method 1 (Guttmacher) yielded the most important corrections, 323 especially when the weights were not truncated (i.e. in the original method): the corrected prevalence of cannabis regular use was9.5% (std=0.86), that is an increase of 25% compared to the unweighted 325 prevalence (7.6%). According to Table 2 and formula (Eq. 1), it would mean that the proportion of 326 cannabis regular use is 19.5% among the missing pupils: a higher value than the prevalence observed 327 among the respondents who reported episodes of truancy (15.6%). The results obtained with the 328 truncation were more realistic. And so were the results obtained with methods 2 and 3: the corrected 329 prevalence of regular cannabis use is 8.7%, that is a relative increase of 14% compared to the raw 330

estimate. 331
The results obtained after the final calibration are very similar (Table 6). Before and after calibration, 332 the standard deviations obtained through the Guttmacher's method were higher than those obtained with 333 methods 2 and 3, as suggested by the higher CV of the weights ( Table 4). None of the corrected estimates 334 fell outside the confidence interval of the classical estimates. 335

Summary of the findings 337
To our knowledge, this is the first study comparing different methods aiming to correct the potential 338 non-response bias relating to missing pupils in a school survey. Our approach relies on few simple 339 assumptions and provides estimates of the true values based on all the available information, that is 340 preferable to any speculative consideration about the magnitude of the underestimation yielded by the 341 classical weighting procedures. In addition, the increase of variance is small. Ignoring the amount of 342 lost information due to non-response (described by the number of missing pupils the day of the data 343 collection), the Guttmacher' method (2002) is not a sound modelling and leads to irrelevant corrections 344 with higher variances. 345

Limitations 346
Our results are based on some strong assumptions that may be challenged. 347 H1 is common to the three methods. It is reasonable because the topic of the survey and the precise day 348 of the survey are not known in advance by the pupils and their parents: it is difficult to imagine that the 349 absence the day of the survey is caused by the survey itself. 350 H2 is common to every weighting technique: the respondents can represent the non-respondents given 351 some characteristics relating to their probability of response and the level of the outcomes. In our case, 352 the literature emphasises the role of truancy as a key parameter of drug use. As the truants report higher 353 levels of drug use than the others (15)(16)(17)(18), and as the number of truancy episodes were correlated to 354 higher levels of drug use among the respondents of our survey, the assumption seems reasonable. The 355 validity of H2 also requires that the dropouts either present the same drug use level than the other truants 356 or represent only a negligible share of them, which is granted because the share of dropout is very low 357 at 15-16 years old in France where school is still mandatory (less than 1%). That the respondents with 358 report of past truancy episodes are more similar to truants the day of the data collection (H5) than those 359 without such reports relies on the same basis. 360 On the opposite, we hypothesise that the pupils with no reported absence or truancy cannot represent 361 effective absent pupils, although they show very similar drug use levels. This increases the weights by 362 construction but should not add bias. 363 The accuracy of the TNRC procedures relies on the accuracy of the data; this is a prerequisite for our 364 five assumptions but it is especially the case for H3 that is at the core of our approach: its validity relies 365 on the honesty of the respondents. H4 and H5 have the same strength and weakness. If the pupils make 366 up truancy episodes as legitimate absences, the total number of missed school days will be more reliable 367 than the distinct counts of legitimate and illegitimate absences, which is an argument in favour of method 368 2. Such a trick would be in accordance with a social desirability bias (29). At the class level, it is likely 369 that a proportion of the apparent legitimate absences are in fact illegitimate and reciprocally but this 370 proportion should be low. 371 The discarded questionnaires have been considered as questionnaires of absent pupils (and not truant 372 pupils), despite showing much higher rates of absence and truancy than the retained questionnaires. This 373 is arbitrary, as they present higher shares of reported truancy; but one reason for this choice is that they 374 were less trustworthy since they showed higher rates of missing values for past absences. 375 In our case, the correlations between truancy and drug use do not differ much by gender and ignoring 376 gender in methods 2 and 3 would provide very similar results. However, it is important to show that this 377 important determinant of drug use and school attendance can be considered easily. 378

379
Combining the number of missing pupils in class and the individual reports of past missed school days 380 by the respondents allows estimating and correcting the non-response bias in a simple that should be 381 applied in every school survey. 382

384
Ethics approval and consent to participate 385 Ethics approval was not required for this school survey. The data are anonymous and confidential 386 and protected by the National Committee on Informatics and liberty. The survey was not mandatory; 387 parents could refuse the participation of their children and the pupils could refuse to participate. 388 389 This manuscript has been seen and approved by all authors, which have been personally and actively 390 involved in substantive work leading to this article, and will hold themselves jointly and individually 391 responsible for its content. 392

393
The authors have no conflicts of interest to declare. 394

395
The ESPAD survey is funded by the French monitoring centre for drug and drug addiction.