Study population
The recruitment and baseline evaluations of the 51,338 CLSA participants aged 45-85 years at enrolment was completed in 2015 [19]. The complete CLSA cohort is composed of the Tracking cohort of 21,241 participants who provide data via telephone interviews and the Comprehensive cohort of 30,097 participants who provide data via in-person home interviews and visits to a data-collection site. Comprehensive participants provided data in English and French on all regularly used drug and NHP.
Drug and NHP data collection / mapping drug data to Health Canada database
In the first of a 3-step process, Drug and NHP data were entered in the CLSA data collection software by interviewers who were trained to identify the relevant information from medication packaging. During an in-home visit, CLSA interviewers asked participants to present all regularly scheduled or taken medications (i.e., scheduled, once a day, every other day, taken occasionally, as required), including prescription, non-prescription, over-the-counter (OTC), herbals, vitamins or NHP in all routes of administration. The interviewer entered either the generic name (e.g. atorvastatin), trade name (e.g. Lipitor) or drug identification number (DIN) (e.g. 02230711) in a type-to-search box that mapped the drug input to the Health Canada DPD and generated a list of corresponding generic or trade drug names. In the absence of adequate drug name correspondence, the name / DIN was entered as a free-text/numeric input. Since the type-to-search box was not mapped to the Health Canada Licensed Natural Health Products Database (LNHPD), NHP were entered as free-text/numeric inputs. The interviewer also recorded information about the dosage, frequency, duration, start date and indications for use.
Drugs authorized for sale by Health Canada are listed in the Health Canada DPD [21] which contains information notably on product name, list of active ingredients, DIN and World Health Organisation (WHO) anatomical therapeutic chemical (ATC) classification. NHP licensed by Health Canada are listed in the Health Canada LNHPD [22] which contains information notably on product name, product’s medicinal ingredients, product’s non-medicinal ingredients, natural product number (NPN). The NHP database does not include ATC codes. Both databases are updated nightly.
Algorithm recoding
In a second step, sequential algorithms were applied to map free-text (drug or NHP names) or numeric (DINs or NPNs) inputs to the products of the Health Canada drug and NHP databases. Seven algorithms were developed in a software algorithm approach independent of the sample data (Table 1). The algorithms were run sequentially such that once an input was matched, it was no longer considered in the remaining algorithms. For a given input, the first algorithm attempted to map the input to the drug followed by the NHP database before moving on to the next algorithm. The Direct and Code algorithms were run first since they only ever matched a single input to a single drug or NHP, while the Word and Simple algorithms at times found multiple matches. In cases of multiple matches due to numerous dosage strengths, the input was matched to the suitable drug or NHP with the lowest DIN or NPN.
Table 1. Developed algorithms
Name
|
Description
|
Examples
|
Code
|
The input is compared to the DIN or NPN. A match is found when the input is identical to the DIN or NPN. There can only ever be one match.
|
The input “02275619” matches the DIN “02275619”. In comparison, the input “0227-5619” does not match the DIN “02275619”.
|
Direct
|
The input is compared to the drug or NHP’s name. A match is found when the input is identical to the drug’s or NHP’s name (including all special characters, spaces, etc.). There can only ever be one match.
|
The input “TYLENOL ALLERGY” matches the drug name “TYLENOL ALLERGY”. In comparison, the input “TYLENOL ALLERGY 100MG” does not match the drug name “TYLENOL ALLERGY”.
|
Word
|
The input is compared to the drug or NHP’s name. A match is found when the drug’s or NHP’s name is found as a sub-string within the input. Spaces are considered such that only whole words can be matched. There may be multiple matches.
|
The input “LARGE TYLENOL SUPER RELIEF 100MG” matches the drug name “TYLENOL SUPER”. In comparison, the input “LARGE TYLENOL SUPERIOR RELIEF 100MG” does not match the drug name “TYLENOL SUPER”.
|
Simple
|
The input with all non-alpha-numeric characters removed is compared to the drug or NHP’s name with all non-alpha-numeric characters removed. A match is found when the two altered names are identical. There may be multiple matches.
|
The input “TYLENOL-ALLERGY (50-MG)” (transformed into “TYLENOLALLERGY50MG”) matches the drug name “TYLENOL ALLERGY 50MG” (transformed into “TYLENOLALLERGY50MG”).
In comparison, the input “TYLENOL-ALLERGY (50-MG)” (transformed into “TYLENOLALLERGY50MG”) does not match the drug name “TYLENOL ALLERGY” (transformed into “TYLENOLALLERGY”).
|
Reverse-word
|
This algorithm is identical to “Word”, but the input is searched as a sub-string within the drug or NHP.
|
|
No-Units
|
The input with all units of measurement removed.
|
The input "ASPIRIN COATED CAPLETS 500MG" would have the units, 500MG, removed and become "ASPIRIN COATED CAPLETS".
|
Predefined
|
List of common drugs and NHPs established by our team. Inputs with predefined names would get coded first.
|
Aspirin, Vitamin B, Vitamin C, Vitamin D, multivitamin, etc.
|
DIN, Drug identification number; NHP, Natural health product; NPN, natural product number.
Work was conducted using SQL (database scripting language) and PHP (general programming language). The Health Canada databases and CLSA data were loaded into a secure MySQL database using SQL. Some pre-processing was conducted on these databases before using PHP to enhance performance, increase speed of matching and make the computer algorithms more efficient. For instance, the Simple algorithm compared the unmapped inputs to drug and NHP names from the Health Canada databases by ignoring non-alpha-numeric characters. This was done by removing the non-alpha-numeric characters from both the unmapped inputs and the Health Canada databases names, then comparing the two. It would be slow to transform the drug names in this way every time a comparison is made. Instead, all drug names were electronically converted during this pre-process step once and used by the algorithm every time a match was searched for. Another example is a list that was made of all identical drug and NHP names. The final version of the algorithm sequence and variables from the Health Canada databases are presented in Supplementary Text.
As part of an iterative algorithm improvement approach, two pharmacists (LD, BC) independently recoded 40 unmapped drug and NHP inputs. The pharmacist-recoded inputs were compared to algorithm-recoded inputs during meetings of the research team, leading to algorithm refinement. This process of review – discussion – algorithm refinement was conducted three times for a total of 120 inputs, leading to two new algorithms: Predefined and No-units (Table 1). The greater complexity of recoding NHP inputs compared to drug inputs was identified early in this process and discussed throughout our work.
Manual recoding
In a third recoding step, following the application of the algorithms to the unmapped drug and NHP data, the remaining unmapped de-identified data were exported directly from the CLSA’s database to an Excel file for manual recoding by 3 pharmacy technicians. The same group of recoders conducted the recoding and validation work. The recoders’ work was supported by a set of decision rules (Supplementary Text) to assign selected NPNs for the most prevalent NHP inputs (e.g., NPN=80083109 for calcium).
Spelling dictionary
As inputs were manually recoded, common misspellings were compiled into a dictionary and applied to future iterations of the computer algorithms. In the pre-processing stage, all inputs containing any of the misspelled words in the dictionary were replaced with the correct spelling before the algorithms were run.
Validation process
A validation sample of 100 Comprehensive cohort participants was randomly selected to evaluate the performance of the recoding algorithms and manual recoding. This sample included 352 free-text drug and NHP inputs for which, a gold standard recoded input was determined independently by 2 recoders with resolution of discrepancies by a pharmacist. A gold standard recoded input could not be stablished for some inputs due to Insufficient input information. Differing commercial products of the same generic drug or NHP were considered to be an agreement. After this first validation, the algorithms were further refined and validated in a second sample of 544 Comprehensive cohort participants with 1407 unmapped free-text drug and NHP inputs. In this second validation, the gold standard recoded input was established by a single recoder based on the measured recoders consensus in the first validation.
Analysis
Manual recoding was considered the gold standard for free-text inputs. The proportion of algorithm-correctly recoded inputs was calculated as the number of algorithm-correctly recoded inputs, based on the gold standard, divided by the number of algorithm-recoded inputs. In the primary analysis, the denominator included only the inputs for which a gold standard could be established in order to distinguish between drug and NHP. In a sensitivity analysis, the denominator included all algorithm-recoded inputs, regardless of gold standard coding, for a more conservative estimate that cannot differentiate between drug and NHP.