Development of the Canadian Substance Use Exposure Database (CanSUED): Modeling the prevalence of substance use in Canadian Jurisdictions, 2006 to 2017

Small area and regional estimates of substance use (SU) exposures are increasingly needed to support estimation of the burden of SU-attributable morbidity and mortality. There is also a need to assess SU prevalence for subgroups by place, time and sociodemographic characteristics to plan the efficient delivery of treatment and harm reduction services. However, the data available from national surveys are often insufficient to produce reliable estimates for subgroups because of small sample sizes. There are also often missing estimates for some jurisdictions and some years when no surveys were conducted. We describe new methods which utilize Canadian national survey data of SU, sales, SU attributable hospitalisations and demographic data to develop the Canadian Substance Use Exposure Database (CanSUED). Estimates from this database have been used in the study of Canadian Substance Use Costs and Harm (CSUCH). design-based direct estimates with the CVs of less than with reliable external design-based direct estimates. empirically confirmed that the model-based EBLUP approach could provide reliable and sensible estimates of SU indicators in population using the nationwide province-based SU monitoring surveys. CCHS cross-sectional collects related to status, surveys region under and at the year-province-sex-age were

were significantly related to the design-based estimates produced from both the CADUMS/CTADS and

Conclusion
The mixed model-adjusted approaches produced reliable estimates for small areas and age-gender groups and help fill gaps caused by data suppression in local and national surveys. We suggest that these methods provide the most comprehensive and reliable estimates available of Canadian substance use by substance category, year, jurisdiction, age and gender. The methods could also be applied in other countries where similar data are available.

Background
Small area and domain estimates of substance use (SU) exposure are needed to support estimation of the burden of SU-attributable harms such as SU-attributable morbidity and mortality. There is also a need to assess SU prevalence for sub-groups defined by place, time and sociodemographic characteristics, to plan the delivery of prevention, treatment and harm reduction services efficiently.
However, sampling methods used in national surveys such as the Canadian Alcohol and Drug Use Monitoring Survey (CADUMS) and the Canadian Tobacco, Alcohol and Drug Survey (CTADS) were designed to produce maximum precision of estimates when reporting at the provincial and national level. The estimates directly produced from such national surveys may be unreliable for population subgroups, sub-regions of smaller size than the provincial level and for when rates of use of a particular substance are low [1]. The available surveys have not always been conducted annually and have not included the three territories in Canada. The aim of this study is thus to present a methodology that overcomes these limitations in available survey estimates by using small area estimation methods supplemented with additional jurisdictions-specific data on SU-attributable hospitalisations, alcohol and tobacco sales and demographic characteristics. In essence, these methods estimate patterns in the data by region, year, age, gender and type of SU and, with large and relevant auxiliary data sets, extrapolate these patterns to create more reliable estimates, especially where data are sparse. This exercise was conducted in order to develop the Canadian Substance Use Exposure Database (CanSUED), an open access database containing information on exposure and prevalence to eight categories of substance use by six-age-sex groups in 13 Canadian provinces and territories from year 2006 to 2017 and that were used in the assessment of Canadian Substance Use Costs and Harm (CSUCH; www.csuch.ca) [2] to overcome problems with missing or suppressed data exposure and prevalence estimates for eight categories of SU alcohol, tobacco, cannabis, opioids, other central nervous system (CNS) depressants (e.g., benzodiazepines, barbiturates), cocaine, other CNS stimulants (e.g., amphetamine, methamphetamine, ecstasy) and other substances (e.g., hallucinogens, inhalants) by age and sex for 2006 to 2017 and for each of Thirdly, there is a need to assess SU of particular populations such as those in the three Territories and also age and sex subgroups regionally in order to plan the delivery of treatment and harm reduction services efficiently.
There are several approaches to small area estimation (SAE) that have been developed and used to produce estimates when reliable estimates cannot be obtained directly from surveys for any of the above reasons [3,4]. One approach is the composite estimator called empirical best linear unbiased prediction (EBLUP). EBLUP has been used to combine cross-sectional and time-series data [4].
To develop our estimates, we first used the method of direct estimates to estimate SU for six age-sex groups in each of ten provinces for survey years adjusted for survey design effects [5]. Using multilevel models [6] make direct estimates of SU prevalence (i.e., mean annual alcohol consumption, tobacco sales, wholly SU attributable hospitalisations, prevalence of SU and relevant auxiliary data to predict the estimates using the empirical best linear unbiased prediction (EBLUP) approach [3]. These estimates were produced by age-sex groups in provinces/territories for each year between 2006 and 2017. All the estimates were broken down by sex and age groups (15-34, 35-64 and 65+) for ten provinces, three territories and the whole of Canada from 2006 to 2017.
Several auxiliary data sources were used to produce reliable estimates for province-age-sex groups when no reliable estimates can be produced directly from the surveys, or when no survey data are available (i.e., 2006, 2007, 2014 and 2016). These auxiliary data include age-sex population counts, per capita alcohol and tobacco sales data, and counts of wholly SU-attributable hospitalizations. Agesex population data in provinces/territories over years [15]

Survey sampling and population coverage
The CADUMS was a yearly survey on alcohol and other SU among Canadians initiated in April 2008 by the Controlled Substances and Tobacco Directorate, Health Canada [7][8][9][10][11]. The survey was derived from the Canadian Addiction Survey administered in 2004 and contained questions on substance use (including prescription drug misuse) and associated harms [19]. From 2013, the same SU questions were carried forward into the CTADS [12,13]. Both the CADUMS and CTADS used random digit dialing to obtain a stratified sample across all 10 provinces with equal representation of subjects each month and based on a two-stage (telephone household, respondent) random sample stratified by province.
The surveys used random-digit dialing (RDD) methods via Computer Assisted Telephone Interviewing (CATI). The sampling approach was designed to produce maximum precision of estimates when reporting at the provincial level by sex and the national level by sex and major age groups.
The sampling frame was based on an electronic inventory of all active telephone area codes and exchanges in Canada. Within each of the 10 provincial strata, a random sample of telephone numbers was selected with equal probability in the first stage of selection (i.e., households). Within selected households, one respondent aged 15 years or older who could complete the interview in English or French was chosen. The person who would celebrate his/her birthday next within the household was asked to complete the interview. The surveys covered the population aged 15 years and older in ten  Table A4 in Appendix A.

Measures of substance use
The CADUMS and CTADS core content included self-report questions concerning general health and well-being, smoking status, alcohol use and harms, pharmaceutical use, cannabis use and harms, other illicit SU (opioids, cocaine, other CNS stimulants and depressants and harms, alcohol and cannabis and driving, pregnancy and SU, and demographics. The questions on SU are presented in Table A5 in Appendix A. Specific indicators analyzed in this study are described below. These exposure estimates of SU were needed to help estimate the number of SU attributable conditions in the CSUCH the study [2].

Tobacco smoking
Measures of tobacco smoking included the prevalence of lifetime non-smokers, former smokers and current smokers. Lifetime non-smokers were those who smoked less than 100 cigarettes in their lifetime. Former smokers were those who smoked at least 100 cigarettes but did not smoke daily or occasionally. Current smokers are those who smoked daily or occasionally when they were surveyed.

Other substance use
The use of cannabis, opioids (illicit or prescribed pain relievers), other CNS depressants (sedatives, tranquilizers), cocaine, other CNS stimulants (amphetamine, methamphetamines, ecstasy and any other stimulants) and other psychoactive substances (hallucinogens, inhalants, etc.) in the past year was assessed. In addition, some SU-related conditions are causally associated with injection drug use (IDU) and an additional analysis was carried out regarding IDU, which was restricted to SU types with injection as a possible route of administration (opioids, cocaine, other CNS stimulants). The proportions of those reporting use of these substances among those aged 15 years and older in the past 12 months were estimated.
Analytical strategy to estimate substance use exposures We developed a statistical model to estimate trends and patterns observed across all the available survey data sets so as to allow reliable estimates of suppressed or otherwise missing data. In our analyses, an estimate with a CV of greater than 33.3% was considered unreliable and was modelled using the methods described below. Specifically, we did so by using auxiliary information and borrowing strength from (1) data collected in neighbouring areas (2) data collected at other times (3) exploiting spatial correlation in the data across regions (4) exploiting the temporal correlation of the target variable in each area to indirect estimates of SU prevalence. Indirect estimators borrow strength from other area and/or time periods to increase effective sample size. These indirect estimates were based on implicit or explicit models that provides a link to related areas and/or time periods through supplementary information such as recent census counts or current administrative records related to the variable of interest.

Direct estimates
Direct estimates of self-reported SU were obtained from the surveys with adjustment for design effects due to strata, clustering and disproportionate selection of subjects in the surveys [5]. Direct Statistical analyses were completed using SAS 9.3 [24]. Direct estimates of mean alcohol consumption were produced using the SAS SURVEYMEANS procedure and percentages of substance users and non-users were estimated using the SAS SURVEYFREQ procedure because these procedures analyze sample survey data taking into account the sample design effects [24]. Direct estimates were conducted by age, sex, province/territory and year. The SAS MIXED procedure estimates the fixedeffects parameters and further produce the EBLUP estimates. The SAS MIXED procedure was used to perform multilevel regression of the direct estimates in which province/territories and year are considered as random effects and auxiliary data such as year-province-age-sex population, rates of wholly SU attributable conditions available by age and gender for all 13 jurisdictions by year, annual per capita cigarettes data and litres of alcohol of official sales data at province level as covariates fixed effects [3,25]. Using the EBLUP method [3] predicts the estimates for all six age-sex groups by years in ten provinces and three territories in 2006-2017.

Validity assessments
We conducted several internal validity checks of the model-based EBLUP estimates. First, we compared the EBLUP estimates against the CADUMS/CTADS design-based direct survey estimates of per capita alcohol consumption and the prevalence of SUs for age-sex groups by provinces and years where there were reliable estimates, i.e., CVs <33.3%. We further compared the EBLUP estimates with the prevalence estimates from the Canadian Community Health Survey (CCHS) where the CVs of the estimates by age-sex groups were smaller than 33.3%. The CCHS conducted by Statistics Canada has a large sample size (a total of 984,911 Canadians were surveyed in 2005-2014) but only provided equivalent questions for some key alcohol and tobacco indicators for the Yukon, the Northwest Territories and Nunavut. More details on the CCHSs can be found elsewhere [12,13,26,27]. Bivariate correlation was used to assess the relationship between the EBLUP estimates and the direct estimates; we estimated the Pearson correlation coefficient for each pair of the estimates. Nationwide and province-level estimates of substance use

Results
The CADUMS and CTADS can produce relatively reliable estimates of alcohol, tobacco and other substance at the provincial and national level. Table 1 Table 5 presents the means of the prevalence estimates of other substance use exclusive of alcohol and tobacco which include past year use of any prescribed or illicit drugs, cannabis (marijuana, hashish, hash oil or other cannabis derivatives), other CNS depressants (sedatives and tranquilizers), cocaine, opioids (prescribed or non-prescribed), other CNS stimulants (amphetamines, methamphetamines, ecstasy and other stimulants such as Ritalin, Concerta, Adderall and Dexedrine), IDU and any other psychoactive substance such as hallucinogens and inhalants. No direct estimates using CCHS data were done because only the data on some types of substances were collected in two provinces, SU was surveyed in the samples of two provinces. A few subjects in CADUMS/CTADS reported use of cocaine, other CNS stimulants, IDU and any other psychoactive substance; therefore, the direct estimates of these SUs for age-sex groups were either unreliable or missing. The direct estimates of prevalence of using any prescribed drug or illicit substance were reliable; none of these estimates had CVs larger than 33.3%. Furthermore, the means of the EBLUP and of the direct estimates of any prescribed or illicit substance use were both very close and highly correlated Additionally, we performed mixed regression analyses of direct estimates and model-based estimates by each type of SU in which the province was treated as a random variable, age and sex were treated as categorical variables and year was treated as a continuous variable to examine the differences in age-sex-province-year of the model-based estimates of SU are consistent with the direct estimates.
There were significant differences in the prevalence of past 12 month use for nearly all types of SU, across age and sex groups and variations across provinces and years (Table A10 in Appendix A) but the direct-survey and model-based estimates show very consistent results.

Discussion
Lack of reliable and comprehensive information on SU by age-sex subgroups, region and year has been a persistent problem for policy makers and researchers attempting to describe SU trends, estimate comparative SU harms and costs, plan serve delivery and evaluate impacts of prevention strategies. Small area estimation that borrows too much strength from larger areas or groups is likely to result in diminished local variances. We produced model-based estimates by incorporating consistent and reliable auxiliary variables from Health Canada, the Canadian Institute for Health Information and Statistics Canada at the provincial and territorial levels to supplement this lack of information.
In this study, we used the model-based EBLUP method to produce estimate of SU exposure by six sexage groups for each of 13 provinces and territories from 2006 to 2017 based on two Canadian national surveys on alcohol, tobacco, cannabis, opioids, cocaine and other categories of SU. The EBLUP method is used because the sample sizes are small for sex-age groups in survey years, no surveys were conducted in years 2006-2007, 2014 and 2016, and the national surveys did not cover the population in three territories. We were able to demonstrate that incorporating available and relevant auxiliary datasets such as rates of SU specific hospitalisation, alcohol and tobacco sales data and population demographics resulted in more reliable and comprehensive estimates across year end jurisdiction than are currently available only from available self-report surveys. Specifically, our methods generated reliable estimates of levels of substance use across all major categories of legal and illegal SU covering years missed in national survey data, jurisdictions not usually covered (i.e. the three territories) and types of substance where prevalence of use is low.
Our results show that the EBLUP estimator had smaller estimated means square error compared to the design-based direct estimator. Our analyses show that the model-based estimates had smaller ranges than design-based direct estimates from CADUMS/CTADS and CCHS and tend to smooth out the group variations in SU estimates in year-province-sex-age groups. This is to be expected since small area statistical models generalize population characteristics and always tend to smooth the final prediction of population outcomes and underestimate the true ranges of EBLUP estimates.
Our validation analysis showed that the model-based EBLUP estimates demonstrated both high internal consistency with the design-based direct estimates with the CVs of less than 33.3% and high consistency with reliable external design-based direct estimates. The main validation results empirically confirmed that the model-based EBLUP approach could provide reliable and sensible estimates of SU indicators in population using the nationwide province-based SU monitoring surveys.
The CCHS is a cross-sectional survey that collects information related to health status, health care utilization and health determinants for Canadians. It surveys a larger sample of respondents and is designed to provide reliable estimates at the health region level under provinces. Our analyses show that more reliable estimates (mainly alcohol and tobacco use), i.e., more estimates with the CVs of less than 33.3% at the year-province-sex-age groups were produced from CCHS than that produced

Conclusion
We have employed these methodologies to create a comprehensive Canadian Substance Use Exposure Database (CanSUED) by three age and two sex groups covering all 13 provinces and territories for 2006 to 2017. We recommend CanSUED for use by researchers and policymakers whose goal is to better understand SU in Canada, to recognize trends and patterns in SU or to estimate the harms and costs caused by SU in society. The analyses presented show that the application of statistical techniques designed to deal with sparse and missing data through the inclusion of relevant auxillary data result in more robust estimates of SU prevalence.

Ethics approval and consent to participate
The estimates of substance use in this study involve only secondary analysis of existing survey (public files) and administrative data obtained from Statistics Canada and Health Canada. All datasets are publicly available and human subjects are anonymous.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.

Funding
The estimates of substance use in this study were conducted in the project of "Canadian Substance    *******************    Box II. Equations of computing the prevalence of substance uses, and its variance, standard error and coefficient of variation