Summary of Findings and Their Significance
The 22 articles selected for inclusion in the study helped us come up with 31 key items in terms of dementia patients and caregivers’ information needs. Based on these information needs, we created 118 questions that reflect the wide range of needs expressed on this topic. These questions were posed to ChatGPT. The received responses were then analyzed by formal caregivers (neurologists and expert nurses), in a semi-structured interview. Despite the widespread recognition of ChatGPT's remarkable capabilities in answering questions across various subjects, it faced limitations when responding to clinical inquiries. In some instances, its responses were incorrect. Formal caregivers, in particular, expressed reservations about relying solely on this tool to meet their information needs. They preferred content that had been either provided or approved by a human expert, such as a medical doctor.
Informal caregivers, however, did see some usefulness in ChatGPT for addressing their information needs. The tool was generally considered to be quite helpful in meeting these individuals’ information needs. Formal caregivers gave a lower mean score (3.13 ± 0.65) to the level of ChatGPT capability than informal caregivers did (3.77 ± 0.98). The difference between the two mean scores was smaller in non-clinical issues (e.g., insurance issues or helpful experiences of other caregivers) than in clinical ones (e.g., treatment, prognosis, current medication, genetic aspects). This suggests that formal caregivers were less satisfied with the answers to more specialized questions than informal caregivers.
Comparisons with Existing Literature
Several recent empirical studies have demonstrated that ChatGPT can provide accurate and up-to-date information on any topic, owing to its access to a wide range of data sources in the development of the underlying large language model [33–35]. In the case of medical topics, however, the chatbot appears to not perform as well. This may be because medicine, and especially detailed questions requiring deep medical knowledge, requires both deep medical knowledge and higher order thinking and problem solving, which artificial intelligence has yet to sufficiently master [36]. ChatGPT has the knowledge, but does not yet think—at least not in the same way people do.
Indeed, numerous studies evaluating ChatGPT's performance in medical domains have consistently shown its unreliability in providing clinical information, particularly regarding chronic diseases. Consequently, at present, it appears to be more suitable for addressing basic inquiries, such as those unrelated to clinical aspects. This limitation becomes evident when considering diseases beyond common ailments like the flu or a cold, as ChatGPT's effectiveness diminishes when confronted with the need for actual data processing and critical thinking. To illustrate this, ChatGPT performs well when it has access to a comprehensive knowledge base that precisely describes diseases. In such cases, it can effectively assist in enhancing understanding of the disease. However, when confronted with complex data processing tasks or when the available knowledge is indistinguishable from background noise, ChatGPT is unlikely to be of substantial assistance.Skyler et. al.’s (2023) study evaluated the ability of ChatGPT to respond to myths and common misconceptions about cancer. They found that ChatGPT’s responses were more accurate than those they obtained from the National Cancer Institute (NCI) [37]. In other studies, ChatGPT has been considered a tool with the prospect of refining personal medicine and with the ability to improve health literacy by providing accessible and understandable health information to the general public [38–40]. These insights have been further substantiated by the findings of the present study.
In the future, there is a possibility that large language models and the chatbots they power, like ChatGPT, will be capable of functioning independently in medical contexts without human assistance. However, in order to achieve this a quality measure should be implemented that enables the chatbot to assess the reliability and accuracy of its responses. If certainty in the quality of its response is low, the bot would not provide this answer, instead saying something like, “I am sorry, my current knowledge is insufficient to respond to your question. Please ask another one.” It could also be beneficial to allow users to customize and set their own quality measure for the bot's responses. This represents an area for further exploration and development when deploying intelligent bots in sensitive domains like medicine.Another critical issue is the ability of such chatbots to generate false knowledge due to failure to truly understand the meaning of the language it is processing, also known as “stochastic parroting” [41]. Based on one of the authors (MK) experience gained during discussions with ChatGPT on programming and data science, it appears that in some situations, ChatGPT tries to create knowledge. For example, when asked how to code a particular task in Python, the bot produced functions that did not exist in Python—although they existed in other programming languages. Initially, the code appeared acceptable, but it ultimately proved to be ineffective and incorrect. Considering the findings from various studies, professional opinions, and discussions, it can be argued that non-professionals, such as informal caregivers of dementia patients, should refrain from using AI bots without consulting a healthcare professional, except for simple questions that do not require clinical knowledge [42, 43]. In such cases, the purpose of using the bot becomes questionable. If one needs to consult an expert anyway, why rely on the bot in the first place? Directly seeking information from an expert would likely yield accurate and timely responses, without the need to assess the bot's output. Thus, one might consider abandoning the idea of using the bot altogether.
However, there is still value in exploring and developing these bots further. We are in the early stages of their creation, and the optimal configuration for different scenarios is still unknown [44]. For instance, it might be advisable to configure bots involved in medical discussions, especially those related to complex clinical topics, not to generate new knowledge—at least for now. Although this may change in the future, comprehensive and quasi-clinical experiments are necessary to determine whether enabling bots to create knowledge will result in valuable insights rather than erroneous or misleading information. It is possible that bots should assess their own knowledge based on various criteria and only provide answers when the assessment indicates high confidence.Without such experiments, using such bots in an unrestricted mannerwould be very risky. ddressing these challenges is more complex than it may appear, requiring the involvement of a formal governing body. The nature of this body—whether it should be national, international, or based on regional entities like the European Union or individual countries—needs careful consideration. Legislation must advance rapidly, but this is only feasible with parallel sociological developments.Our findings indicate that there is a significant difference in the perspectives of formal and informal caregivers concerning clinical and specialized aspects of dementia care, such as treatment, prognosis, current medication, current research, and genetic aspects. While formal caregivers did not find ChatGPT's answers convincing, reliable, or comprehensive in these areas, informal caregivers considered the same responses helpful and of higher quality. In the interviews, specialists emphasized the need for a healthcare consultant to verify ChatGPT's answers. Interestingly, despite this, informal caregivers regarded ChatGPT as superior to previous tools and methods in meeting their information needs, primarily due to its prompt response rate.Overall, this study illustrates that ChatGPT can be a helpful tool for addressing basic information needs related to advanced dementia patients, but it likely should not be used to learn find more specialized and clinical knowledge. ChatGPT, as an artificial intelligence large language model, can handle a wide range of questions and provide quick and accurate answers. Additionally, it is available 24 hours a day, 7 days a week, making it a very convenient option for users who need information outside of normal business hours. However, it is important to note that ChatGPT is not infallible and there may be instances where the information provided is not entirely accurate or appropriate [41]. Its answers about medical issues are often imperfect, and in clinical cases, a physician’s expertise and judgment are necessary. Until ChatGPT becomes a much better “thinker”, it should not be used as the only source of information about advanced topics related to dementia and the care of dementia patients.
Strengths and Limitations of this Study
In this study, for the first time, answers to patients' questions from ChatGPT are analyzed and evaluated by people (formal and informal caregivers) using both quantitative and qualitative approaches. This evaluation offers valuable insights for enhancing the performance of ChatGPT in this domain. However, it is important to acknowledge the limitations of this study. Firstly, the participant sample size was small, consisting of only 30 caregivers. Additionally, there was limited diversity among the participants, as they were all from Iran and resided in the same large city. This restricts the generalizability of the findings and prevents us from capturing the perspectives of caregivers from small towns and villages. Conducting the study with a larger and more diverse population would yield more reliable and representative results.
In future research, expanding the participant pool to include a larger and more diverse sample would provide a more comprehensive understanding of caregivers' perceptions and experiences with ChatGPT. It would also be beneficial to explore the perspectives of caregivers from different cultural backgrounds and geographical locations, as this could uncover additional insights and potential variations in their assessments of ChatGPT's performance.