The increasing integration of artificial intelligence (AI) into healthcare has sparked both excitement and concern, particularly regarding the use of generative AI (Gen-AI) systems for providing medical advice. Among the various applications, one of the most notable is the recommendation of over the counter (OTC) medications. These systems are designed to assist consumers by offering personalized advice based on input symptoms and conditions, potentially improving accessibility to healthcare information. However, the reliability and consistency of these recommendations are critical, given their potential impact on public health.
Historically, OTC medication advice has been provided by healthcare professionals, including pharmacists and physicians, who rely on their expertise and experience to recommend appropriate treatments. The shift towards AI-driven recommendations is driven by the promise of greater accessibility and efficiency. Gen-AI systems, trained on extensive medical databases and employing advanced natural language processing techniques, can theoretically provide rapid and accurate advice. However, the quality of this advice must be rigorously evaluated to ensure it meets the necessary standards of accuracy, safety, and reliability.
Several studies have examined the general capabilities of AI in healthcare, highlighting both the potential benefits and the risks associated with its use. Issues such as data quality, algorithmic bias, and the contextual understanding of AI systems have been identified as significant challenges. These concerns are particularly pertinent in the context of OTC medications, where incorrect or inappropriate recommendations can lead to adverse health outcomes. To address these concerns, this study aims to answer two pivotal research questions: Is the quality of over-the-counter medication recommendations by commercially available Gen-AI systems adequately reliable? Is there a significant difference in the quality of over-the-counter medication recommendations between two commercially available Gen-AI systems?
By systematically evaluating Gen-AI based systems' performance, this research seeks to provide a full assessment of their reliability and consistency. This includes analyzing the accuracy of the recommendations by comparing the recommendations with the purpose for the OTC medication identified by FDA as a bases for comparing their adherence to established medical guidelines. Additionally, by comparing different Gen-AI systems, the study aims to identify any significant differences in their remediation quality performance, thereby offering insights into the variability and robustness of AI-driven healthcare solutions.
The findings of this research will contribute to provide clarity with the readiness and the role of AI in healthcare, providing valuable information for consumers, healthcare providers, and policymakers.
Research Questions
Despite their growing use, the reliability and quality of these recommendations remain under scrutiny, raising significant concerns about their efficacy and safety. This study seeks to address the following critical research questions:
Q1. Is the quality of over-the-counter medication recommendations by commercially available Gen-AI system adequately reliable?
Q2. Is there a significant difference in the quality of over-the-counter medication recommendation between two commercially available Gen-AI systems?
By investigating these questions, this research aims to determine the reliability of Gen-AI-based systems and identify improvement opportunities.
Hypotheses
H1o. There is no statistically significant difference between the quality of over-the-counter medication recommendation between the FDA Database based system and medication recommended by commercially available Gen-AI.
H1a. There is a statistically significant difference between the quality of over-the-counter medication recommendation between the FDA Database based system and medication recommended by commercially available Gen-AI.
H2o. There is no statistically significant difference between the quality of over-the-counter medication recommendation between two medications recommended by two commercially available Gen-AI.
H2a. There is statistically significant difference between the quality of over-the-counter medication recommendation between two medications recommended by two commercially available Gen-AI.
Nature of the Study
This is a quantitative study with primary aim to evaluate the reliability and quality of over the counter (OTC) medication recommendations provided by commercially available generative artificial intelligence (Gen-AI) systems. This research was an essential first step for determining and hence ensuring that AI-driven healthcare solutions meet necessary standards for patient safety and efficacy. This study employs a comparative, quantitative research design. By comparing two commercially available Gen-AI systems with FDA Database of OTC Medications, the research was aimed to quantify and analyze the reliability and quality of their OTC medication recommendations.
OTC medication recommendations from two different commercially available Gen-AI systems were collected for a set of common health conditions for which OTC medications are typically recommended were used for the prompts. A diverse range of health conditions such as cold, headache, fever, allergies were included to ensure comprehensive evaluation. Multiple instances of recommendations for each health condition were collected to account for variability. The collected data was analyzed using statistical methods to compare the quality of recommendations between the two Gen-AI systems and inferential statistics (chi-square test) was employed to identify significant differences between the systems. The study focuses on the quality of OTC medication recommendations and did not cover prescription medications or other types of healthcare recommendations.
Significance of the Study
The findings of this study were to offer insights for consumers and healthcare providers regarding the reliability of AI recommendations for over-the-counter drugs without any manipulation of the model. These insights will help determine whether the AI model is suitable for providing medical recommendations or if it requires the expertise of domain experts to ensure its accuracy and reliability.