Human Papillomavirus (HPV) is a group of viruses that infect the skin and mucous membranes, with over 100 types identified [1]. HPV is primarily transmitted through sexual contact and can infect the genital area, leading to genital warts and various cancers, including cervical, anal, penile, vaginal, vulvar, and oropharyngeal cancers [2], [3], [4]. Among these, cervical cancer stands out as the most common HPV-related cancer and a leading cause of cancer-related deaths in women worldwide, contributing to an estimated 266,000 cervical cancer deaths annually due to HPV infection [5], [6], [7], [8], [9]. This burden is especially pronounced in low- and middle-income countries where access to screening and treatment is limited [5].
Similar to other infectious diseases, the development of HPV vaccines has also been a significant advancement in preventive healthcare [10], [11], [12]. HPV vaccines primarily target HPV types 16 and 18, which are responsible for approximately 70% of cervical cancers and a significant proportion of other HPV-related cancers [13]. By preventing HPV infection, these vaccines can effectively reduce the incidence of HPV-related diseases, including cervical cancer [13]. Clinical trials have demonstrated the high efficacy of HPV vaccines in preventing HPV infection and related diseases [14]. Furthermore, population-based studies have shown a substantial decline in HPV infections and HPV-related outcomes in countries with high HPV vaccination coverage, highlighting the real-world effectiveness of these vaccines [15]. Overall, HPV vaccines are a crucial tool in the prevention of HPV-related diseases, particularly cervical cancer. Widespread vaccination has the potential to significantly reduce the burden of HPV-related cancers and improve the overall health outcomes of populations globally [16].
Despite the proven benefits of HPV vaccination, there are various concerns and forms of hesitancy surrounding its use [17]. Some individuals and communities are hesitant due to insufficient and inadequate information about HPV vaccination or misinformation about the vaccine's safety and efficacy, often fueled by misinformation spread through social media and other channels [18], [19], [20]. Concerns about the long-term effects of the vaccine and its perceived necessity for individuals who may not consider themselves to be at high risk for HPV-related diseases also contribute to hesitancy [20]. Additionally, cultural or religious beliefs, distrust of pharmaceutical companies, and concerns about the vaccination's affordability and accessibility in low-resource settings can all play a role in vaccine hesitancy [21]. Addressing these concerns through accurate information, targeted education campaigns, and improved access to vaccination services is crucial in increasing HPV vaccination rates and reducing the burden of HPV-related diseases.
Traditionally, question answering (QA) systems have been developed using rule-based approaches, information retrieval techniques, deep learning-based approaches, or hybrid methods [22], [23]. Rule-based QA systems rely on predefined rules and patterns to extract relevant information from a knowledge base or document collection in response to a question [24]. Tsampos and Marakakis, for example, developed a rule-based medical question answering system in Python using spaCy for natural language processing and Neo4j for graph database management [25]. They used Cypher queries to retrieve information from the graph database to answer user questions, and the system can handle complex questions by searching for relations between remote nodes and using synonyms to match nodes or paths [25]. Cairns et al. developed MiPACQ, a rule-based question answering system, by first retrieving candidate answer paragraphs using a paragraph-level baseline system based on the Lucene search engine [26]. The paragraphs were then re-ranked using a fixed formula that incorporated semantic annotations from the MiPACQ annotation pipeline [26]. This method utilized a scoring function that combined original paragraph scores with bag-of-words and UMLS entity components, ensuring that relevant paragraphs were prioritized for better question answering performance [26]. Information retrieval-based QA systems use keyword matching and ranking algorithms to retrieve documents or passages likely to contain the answer [27]. For example, Guo et al. developed a retrieval-based medical question answering system that efficiently retrieves answers using Elasticsearch and enhances them with semantic matching and knowledge graphs [28]. The system's novel siamese-based answer selection architecture outperformed baseline models and systems in both Chinese and English datasets, demonstrating consistent improvements in quantification and qualification evaluations [28]. Deep learning-based QA systems have emerged as a more flexible and adaptable approach, leveraging techniques such as powerful neural network architectures to automatically learn to understand and respond to questions [29]. Yin et al. developed Evebot, a conversational system for detecting negative emotions and preventing depression through positive suggestions [30]. It uses deep-learning models including a Bi-LSTM for emotion detection and an anti-language sequence-to-sequence neural network for counseling [30].
While these traditional QA systems have been effective for certain types of questions and domains, they have several limitations. One major limitation is the reliance of rule-based approaches on predefined rules or keywords, which makes them less flexible and adaptable to new or complex questions [31]. These systems also struggle with understanding natural language queries and context, often leading to inaccurate or incomplete answers. Additionally, traditional QA systems are limited by the quality and coverage of their underlying knowledge base or document collection, which can affect the accuracy and relevance of their answers [32]. For deep learning-based QA systems, one major limitation is their dependency on large amounts of labeled training data [33], [34], [35], [36]. These systems require vast datasets to learn patterns in language and develop accurate models, which can be challenging and resource-intensive to obtain, especially for specialized domains or languages [33]. Additionally, deep learning-based QA systems may struggle with out-of-domain or adversarial examples, where the input falls outside the scope of the training data, leading to errors or inaccurate responses [29], [37], [38].
Another limitation of traditional QA systems is their inability to provide explanations or reasoning behind their answers [39]. These systems typically return a single answer without any supporting context or evidence, making it challenging for users to understand how the answer was derived [40], [41]. This lack of transparency can reduce user trust and confidence in the system, especially in critical applications such as healthcare or legal domains [42]. Overall, while traditional QA systems have been valuable in certain contexts, their limitations have led to the development of more advanced approaches
In recent years, the advent of powerful language models, such as the Generative Pre-trained Transformer (GPT), has revolutionized the field of natural language processing (NLP) and opened up new possibilities for conversational agents [35], [43], [44], [45]. GPT, developed by OpenAI, is a state-of-the-art deep learning model capable of generating human-like text based on the input it receives [35], [43], [44], [45]. The latest iteration, GPT-4, is distinguished by its ability to learn from vast amounts of text data, supported by its billions of parameters, enabling it to capture complex patterns in language and generate highly coherent and informative text [46], [47, p. 4], [48]. However, a significant challenge with GPT models, including ChatGPT, is their tendency to produce hallucinations or responses that, while plausible, are factually incorrect [49]. This issue has raised concerns about the reliability of these models, especially in critical applications such as healthcare [50]. To address this problem, researchers and developers are investigating the use of well-curated knowledge bases (KBs) to refine the models. By integrating authenticated and reliable information from KBs, the goal is to enhance the model's capability to generate pertinent and accurate responses, thereby decreasing the risk of hallucinations. This has led to the development of chatbots and question answering systems powered by GPT that can provide information and assistance across various domains [48].
In the context of healthcare, the potential of GPT-powered question answering systems and chatbots is particularly promising [51]. Seenivasan et al. developed an end-to-end trainable Language-Vision GPT (LV-GPT) model to leverage GPT-based LLMs for Visual Question Answering (VQA) in robotic surgery [52]. The LV-GPT model extends GPT2 to process vision input (images) by incorporating a vision tokenizer and vision token embedding [52]. The model outperforms other state-of-the-art VQA models on public surgical-VQA datasets and a newly annotated dataset, demonstrating its effectiveness in capturing context from both language and vision modalities [52]. Shi et al. developed a GPT-based Question Answering System for Fundus Fluorescein Angiography (FFA) with an image-text alignment module and a GPT-based interactive QA module [53]. The system showed satisfactory performance in automatic evaluation and high accuracy and completeness in manual assessments, facilitating dynamic communication between ophthalmologists and patients for enhanced diagnostic processes [53]. Although GPT-powered question answering systems and chatbots in healthcare hold significant promise, we found that these systems exhibit hallucination issues because they use pre-trained GPT models directly without fine-tuning [53].In the case of HPV vaccination, where inadequate information and misconceptions are prevalent, leveraging fine-tuning techniques with advanced GPT models can significantly enhance the accuracy and reliability of information provided. A GPT-powered chatbot, when properly fine-tuned, could play a crucial role in educating the public and increasing awareness about the importance of vaccination.
In this paper, we present the development and evaluation of a GPT-powered chatbot (VaxBot-HPV) designed to provide information and answer questions about the HPV vaccine. We also describe the design and implementation of the chatbot, its capabilities and limitations, as well as its potential impact on public health.
Overall, this paper highlights the potential of GPT-powered question answering systems and chatbots in healthcare, particularly in the context of HPV vaccination, and demonstrates how these systems can be leveraged to improve health literacy and promote vaccination uptake.