Categorising UK Biobank Self-Reported Medication Data using Text Matching
Introduction
There are multiple potential sources of information about medications taken by participants in large population and patient cohorts such as the UK Biobank (UKBB) study, including from self-reported information and from linkage to electronic health records. At the time of writing, only self-reported medication data from UKBB participants have been released, in the form of medication codes that are assigned at the time of assessment centre interviews. Prescribing data is due to be released by UKBB in the near future. UKBB self-reported medication codes and descriptions do not have any published structure, which means that grouping medications into broader categories to contribute to clinical phenotyping is not possible. The motivation for the software project described here was to develop an automated means of classifying UKBB self-reported medications, for clinical phenotyping purposes.
Methods
We describe software tools, developed to match UKBB medication descriptions with terms from drug classification systems. The WHO’s Anatomical Therapeutic Chemical (ATC) Classification System and the British National Formulary (BNF) were selected as classification systems and were matched separately against UKBB medication coded descriptions. Manual matches can be added, for the cases where matching either fails or results in ambiguity. Manual matched codes would need to be added from the target classification system.
Results
Of the 3,646 medications reported as having been used by UKBB participants, 2,935 (80.5%) were matched with ATC system codes and 3,338 (91.6%) were matched with BNF codes. In general, medications remaining unmatched after manually matching those considered important according to clinician opinion, were either over the counter medicines with general descriptions or had very low participant report counts. A case study was conducted in which genetic associations between individual medications as phenotypes versus a Blood Pressure Genetic Risk Score were found less significant than those found once the medications had been automatically grouped.
Conclusion
Use of the matching software has been proven to assist in building medication-based phenotype proxies and to increase association significance over single UKBB medication codes. The matching software developed is available for general use and should also be applicable to other classification problems where descriptive text can be matched. https://github.com/PhilAppleby/ukbb-srmed/
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
This is a list of supplementary files associated with this preprint. Click to download.
Posted 18 Dec, 2019
Categorising UK Biobank Self-Reported Medication Data using Text Matching
Posted 18 Dec, 2019
Introduction
There are multiple potential sources of information about medications taken by participants in large population and patient cohorts such as the UK Biobank (UKBB) study, including from self-reported information and from linkage to electronic health records. At the time of writing, only self-reported medication data from UKBB participants have been released, in the form of medication codes that are assigned at the time of assessment centre interviews. Prescribing data is due to be released by UKBB in the near future. UKBB self-reported medication codes and descriptions do not have any published structure, which means that grouping medications into broader categories to contribute to clinical phenotyping is not possible. The motivation for the software project described here was to develop an automated means of classifying UKBB self-reported medications, for clinical phenotyping purposes.
Methods
We describe software tools, developed to match UKBB medication descriptions with terms from drug classification systems. The WHO’s Anatomical Therapeutic Chemical (ATC) Classification System and the British National Formulary (BNF) were selected as classification systems and were matched separately against UKBB medication coded descriptions. Manual matches can be added, for the cases where matching either fails or results in ambiguity. Manual matched codes would need to be added from the target classification system.
Results
Of the 3,646 medications reported as having been used by UKBB participants, 2,935 (80.5%) were matched with ATC system codes and 3,338 (91.6%) were matched with BNF codes. In general, medications remaining unmatched after manually matching those considered important according to clinician opinion, were either over the counter medicines with general descriptions or had very low participant report counts. A case study was conducted in which genetic associations between individual medications as phenotypes versus a Blood Pressure Genetic Risk Score were found less significant than those found once the medications had been automatically grouped.
Conclusion
Use of the matching software has been proven to assist in building medication-based phenotype proxies and to increase association significance over single UKBB medication codes. The matching software developed is available for general use and should also be applicable to other classification problems where descriptive text can be matched. https://github.com/PhilAppleby/ukbb-srmed/
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5