Recent efforts for the analysis of social media text for studying PM and drug abuse can be categorized into three groups on the basis of the methodology employed: (i) manual analysis; (ii) unsupervised analysis, and (iii) automatic classification using supervised machine learning. In most early works, researchers proposed and tested hypotheses via manual analyses of social media contents (e.g., examining opioid chatter from Twitter to determine the presence of self-reports of misuse/abuse). Chan et al.(20) used two weeks’ of data from Twitter to manually code users’ messages and message contexts (personal versus general experiences). Similarly, Shutler et al (21) aimed to qualitatively examine tweets that mentioned prescription opioids to decide if they represent abuse or non-abuse, whether they were characterizable, and to examine the connotation (positive [i.e., analgesic use], negative [i.e., adverse event], or non-characterizable). The second approach is unsupervised methods, which have been popular for finding trends from large social media datasets, such as applying topic modeling using LDA (22) to identify topics that are associated with selected drugs. However, past researcher(9) demonstrated that only small amounts of data may present abuse information, and the unsupervised methods are probably considerably affected by unrelated content. Consequently, the decisions derived might be unreliable or generalized. When working with general social media data, developing and applying a robust supervised classification approach before topic modeling or trend analysis could improve the conclusion derived from this approach to understand the text and is methodologically more robust.(9)
The third approach is supervised machine learning, particularly automatic text classification, and it enables researchers to overcome the problems associated with unsupervised methods by filtering out unrelated content. However, supervised machine learning methods need high-quality, manually annotated datasets to train, and, if the trained models show promising results, they can be applied to large datasets to curate relevant data automatically. Multiple distinct approaches have been attempted for automatically detecting drug abuse/misuse from social media chatter. For example, Jenhani et al.(19) developed hybrid linguistic rules and a machine learning-based approach to detect drug-abuse-related tweets automatically. In our past work,(16) we aimed to investigate the opportunity of using social media as a resource for the automatic monitoring of prescription drug abuse by developing an automatic classification system that can classify possible abuse versus no-abuse posts. In some studies,(23)(24), deep learning models were developed to detect drug abuse risk behavior using two datasets. The first dataset was manually annotated, and a deep learning model trained on the first dataset was applied to annotate the second dataset automatically. Both datasets were then used to train and develop the final deep learning model. Some studies have used social media sources other than Twitter,(25) employing machine learning methods (LR, SVM, and RF) to determine whether a Reddit post was about opioid use disorder recovery. Despite the potential application of supervised classification approaches, our recent review on the topic(9) showed that significant improvements in the performances of current systems were needed to effectively utilize social media data for PM abuse monitoring.
In this paper, we model the problem of automatic detection of nonmedical PM use from Twitter data as a supervised classification problem and we present the development of a state-of-the-art classifier that outperforms systems presented in the past. Our proposed classifier is based on context-preserving bidirectional encoder representations from transformers (BERT)(26)—a language representation methodology that has considerably advanced the state-of-the-art in several sentence classification, inter-sentence classification, information extraction (named entity recognition), question answering, and other NLP tasks.(26, 27) When BERT is trained on large unlabeled texts, it is able to capture contextual semantic information in the underlying vector representations, and the representations may then be fine-tuned for several downstream NLP applications. BERT’s key technical improvement is applying the bidirectional training of transformer to language modeling, leading to improved generalizability, enhanced understanding of word meaning, and deep sense of language context and flow. BERT-based models present a significant improvement over past state-of-the-art models that were primarily based on word2vec,(28) as they represent words or text fragments in a way that captures contextual information causing the same text fragments to have different representations when they appear in different contexts. We also propose fusion learning among multiple BERT-like models to capture additional patterns that may improve classification performance. On a publicly available Twitter dataset with four classes,(29) our best-performing fusion-based model performs significantly better in terms of detecting PM abuse related posts with an F1-score of 0.67 (95% CI: 0.64–0.69) than the best traditional classification model, which obtains 0.45 (95% CI: 0.42–0.48). We present an analysis of the system errors to better understand the limitations of BERT-like models, and we recommend future research directions for further improving classification performance on the non-majority PM abuse class. A summary of our contributions is provided below:
-
We propose BERT-based models, fusion learning among multiple BERT-like models, and between BERT-like and deep learning (BiLSTM) models to enhance classification performance for PM abuse detection/classification.
-
We present extensive performance comparisons of several baseline machine learning, including deep learning, methods with BERT-like models and fusion learning models using a publicly available Twitter PM abuse dataset.
-
We present empirical analyses of BERT-based models and a discussion of their advantages and drawbacks for application in social media text classification in general and PM abuse detection in particular.