No Time for ChitChat? Artificial intelligence Chatbots as a tool to identify research priorities in hip and knee arthroplasty.

doi:10.21203/rs.3.rs-3339904/v1

Download PDF

Article

No Time for ChitChat? Artificial intelligence Chatbots as a tool to identify research priorities in hip and knee arthroplasty.

https://doi.org/10.21203/rs.3.rs-3339904/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: Artificial intelligence (AI) Chatbots, such as ChatGPT3, have gained attention in medical and non-medical domains. Their ability to identify research gaps in orthopaedics is yet to be tested.

Aims: This study aimed to assess the application of three AI Chatbots to identify research questions in hip and knee arthroplasty in comparison to an existing research prioritisation consensus method.

Methods: Three Chatbots, ChatGPT3, Bing and Bard were prompted to identify research questions in hip and knee arthroplasty. Two authors independently compared the responses to the 21 research priorities for hip and knee arthroplasty established by the James Lind Alliance (JLA). Any discrepancies were discussed with senior authors.

Results: ChatGPT3 successfully identified to 15 (71%) priorities. Bard, nine (42%) priorities, while Bing identified eight (38%). The Chatbots identified further questions that were not stated in the JLA exercise (ChatGPT3: 12 questions; Bard: 14 questions; Bing: 11 questions). All three Chatbots failed to identify five (24%) of the JLA research priorities.

Conclusions: This study reports the first evidence of the potential adoption of AI Chatbots to identify research questions in hip and knee arthroplasty. This may potentially represent a valuable adjunct in improving efficiency of research prioritisation exercises.

Health sciences/Medical research/Pre clinical studies

Health sciences/Anatomy/Musculoskeletal system/Bone

Health sciences/Anatomy/Musculoskeletal system/Cartilage

Health sciences/Anatomy/Musculoskeletal system/Ligaments

Health sciences/Anatomy/Musculoskeletal system/Skeleton

Artificial Intelligence (AI), Machine Learning (ML), Natural language processing (NLP), Large Language Models (LLM) and AI Chatbots have emerged as hotly discussed topics in healthcare and orthopaedic research.[1]Artificial intelligence refers to the development of computer systems that possess the capability to perform tasks typically associated with human intelligence. [2, 3] Machine learning, a subset of AI, empowers machines to learn from extensive datasets and enhance their performance through experience. By analysing relationships within data, ML algorithms can identify complex patterns, make predictions and generate insights that might otherwise go unnoticed.[3] In the orthopaedic field, ML algorithms have proven valuable in analysing large datasets, uncovering patterns and predicting outcomes or complications related to specific procedures, such as Total Knee Arthroplasty (TKA).[4–6]

Natural language processing is a field of AI that utilises ML techniques to help learn patterns and relationships in language data to allow for computers to understand, interpret and generate human language. [7, 8] Large language models, popularised by the Chat Generative Pre-Trained Transformer (ChatGPT) by OpenAI (San Francisco, California, USA) [9], are a type of NLP and represents a substantial development in these technologies.[10] Large language models utilise ML models such as recurrent neural networks to learn from large datasets, recognise patterns and produce generalised statements through inputs.[11] Through extensive training on large text-based datasets, LLM can comprehend and produce human-like responses, making them invaluable tools for tasks such as language translation, text summarisation, and even generating realistic conversations. Artificial intelligence Chatbots use LLM to generate responses to text provided.

Artificial intelligence Chatbots, through their LLM, possess the capacity to harness a vast pool of knowledge encompassing research papers, clinical guidelines, and medical literature and provide responses to ‘prompts’ provided. This potentially offers valuable insights and aid in the decision-making process for healthcare professionals.[10] A recent study was carried out to examine the swift growth of scientific literature. The findings revealed an average annual growth rate of 4.79% in the field of life sciences, encompassing health sciences. As a result, the volume of literature within this domain doubles approximately every 14.8 years.[12]

The process of determining research priorities involves conducting thorough surveys and analyses, utilising the expertise of clinical specialists and patient groups, and reviewing existing evidence before finalising priorities.[13] One such organisation that preforms this is the James Lind Alliance (JLA). The JLA is a UK-based organisation that brings together patients, caregivers, and healthcare professionals to identify and prioritise research questions. Using a methodology called Priority Setting Partnerships (PSPs), the JLA gathers input from multiple stakeholders to create a list of research priorities that address their needs. Further details of how these priorities are created are set out in the JLA guidebook. [14]

Chatbots have the potential to significantly streamline the process of identifying research priorities, like those undertaken by the JLA. Leveraging the advanced capabilities of Chatbots, researchers and healthcare professionals can tap into a vast pool of knowledge and anticipate areas that warrant additional investigation. With their ability to efficiently process information and provide targeted insights, Chatbots offer a valuable tool for expediting and enhancing the research prioritisation process.

According to the 19th annual report of the National Joint Registry [15], lower limb arthroplasty accounts for a substantial portion of joint replacement surgeries performed in the United Kingdom. Over 240,000 total hip arthroplasty (THA) and 235,000 TKA were conducted from 2019 to 2021 in the UK. Given the frequency at which these procedures are performed, and the ample clinical and academic evidence associated with them, this study specifically identified hip and knee arthroplasty as the 'tracer population' for examination in this research.

To our knowledge, no study has investigated the use of AI Chatbots for identifying research priorities/questions in orthopaedic hip and knee arthroplasty.

Aims

This paper aims to explore the potential applications of three AI Chatbots which use three different LLM: (i) ChatGPT3 [9], (ii) Bing[16] and (iii) Bard [17] in identifying research questions in the field of hip and knee arthroplasty compared to the JLA research priority exercise for this population.

Ethics

This paper does not require ethical approval as it is based on information freely available in the public domain and does not pose any ethical concerns pertaining to human subjects, thereby exempting it from the requirement for ethical approval.

Choice of AI Chatbots

While there are several AI Chatbots available, only a few can be used without any cost (at the time of writing). Among the chosen three Chatbots, ChatGPT3 stands out due to its popularity as evidenced by numerous scientific papers [10, 18, 19] and the increased interest it has generated among the public.[20] This chatbot employs the LLM GPT3 model. While ChatGPT3 operates independently without an internet connection, its training process involves utilising five comprehensive datasets that contain a vast amount of internet-derived information up until the end of 2021.[21]

Google's Bard, was selected as it uses a different LLM for its Chatbot called Pathways Language Model 2 (PaLM2).[22] Bard has access to the internet. It is important to note that Bard offered the option of presenting three ‘drafts’ (responses). All three drafts were included in the results section, with the most relevant draft selected for the follow-up ‘prompts’. Similarly, for the follow-up ‘prompts’, only the most pertinent draft that aligned with our research question was considered. Bard can on occasion provide ‘sources’ (references) when answering ‘prompts’.

Lastly, Microsoft's Bing was included due to its incorporation of the LLM Generative Pre-Trained Transformer (GPT4) model (an updated version of GPT3), which has constant access to the internet, in its search function.[23] This integration of advanced language processing technology adds value to Bing's chatbot capabilities. Furthermore, Bing occupies second place in the market share regarding search engines behind Google.[24] Bing provided three conversation styles: more creative, more balanced, and more precise. To ensure comprehensive analysis, we gathered answers to ‘prompts’ from all three conversation styles. Like Bard, Bing can provide references when answering ‘prompts’.

James Lind Alliance Prioritisation (JLA)

The selection of the JLA as a comparator was based on its comprehensive approach using a multi-stakeholder approach to determining research priorities. It is also recognised by the National Institute for Health and Care Research (NIHR) who actively seek applications from researchers to target set priorities. [25]. In this study, the JLA priorities for hip and knee replacements published in 2014, consisting of 21 documented priorities, were employed as the comparator for analysis.[26]

‘Prompts’ with Chatbots

The study began with an initial question: "What are the current unanswered research questions for orthopaedic surgery specifically in lower limb arthroplasty?" This question was followed by four additional ‘prompts’ in the same conversation, asking, "Any other research questions in this field?" Each AI chatbot was separately prompted with the request, "Give me all the current unanswered research questions for Orthopaedic surgery specifically in lower limb arthroplasty?" In addition, each Chatbot was separately prompted “Do you have access to the James Lind Alliance Prioritisation for Hip & Knee Replacement for Osteoarthritis Top 10 published in 2014”.

Exact ‘prompt’ to each Chatbot are contained in Supplementary File 1. All ‘prompts’ except for the one used for the JLA prioritisation were completed on 5th June 2023. The prompt for the JLA was completed on 25th June 2023. The results obtained from the Chatbots were then compiled and compared with JLA priorities.

Data collection

All the results obtained from the Chatbots were reviewed by two authors (AR & SR), then organised in a tabulated format and compared to the expert opinion. Responses that encompass multiple research priorities were tabulated to align with the relevant priority, resulting in some responses being tabulated against multiple JLA priorities. The findings from ChatGPT3 were condensed into concise statements and organised within the tabulated format to facilitate analysis, as the generated responses tended to be lengthy. Where discrepancies were found between authors, they were discussed and reviewed with the senior authors (CK, IA & ASW).

Data analysis

The primary outcome was to compare how many JLA priorities were identified by each Chatbot. To measure the level of alignment between the responses from the Chatbots and the JLA priorities, we assigned a value of 100 percent to the 21 JLA priorities. Subsequently, we calculated the percentage of research questions generated by each Chatbot that aligned to the JLA priorities.

‘Prompts’ were run on three AI Chatbots successfully.

A sample of their outputs, in comparison to JLA research priorities are presented in Table 1. Any outputs that covered more than one JLA priority have been tabulated. The full table is available in Supplementary Table 1.

Table 1

Sample of results of three Chatbots in comparison to JLA research priorities Responses to ‘prompts’ from each chatbot are tabulated against the respective JLA priorities that relate to them. Blank cells are when the chatbot was unable to generate a suitable response to match the JLA priority.
James Lind Alliance	ChatGpt3	Bard	Bing
What are the most important patient and clinical outcomes in hip and knee replacement surgery, for people with osteoarthritis, and what is the best way to measure them?	Patient Outcomes and Quality of Life	How can we improve the predictability of outcomes after total knee arthroplasty?	Can we produce a tool to predict outcome and failure, and is it cost-effective? How can we improve the patient-reported outcome measures (PROMs) and patient-reported experience measures (PREMs) for lower limb arthroplasty What are the important risk factors for poor surgery outcomes and how can they be combined to produce predictive tools for poor outcomes
What is the optimal timing for hip and knee replacement surgery, in people with osteoarthritis, for best post-operative outcomes?	Patient Selection and Personalized Medicine	-	What are the optimal indications and timing for revision surgery in lower limb arthroplasty?
In people with osteoarthritis, what are the pre-operative predictors of post-operative success (and risk factors of poor outcomes)?	Patient Selection and Personalized Medicine	What are the best ways to predict who is most likely to benefit from TKA? How can we better predict which patients are at risk for complications after surgery?	-
What (health service) pre-operative, intra-operative, and post-operative factors can be modified to influence outcome following hip and knee replacement?	Preoperative Risk Assessment Rehabilitation and Recovery Comparative Effectiveness Research	-	-

Results that did not align with the JLA priorities are presented in Table 2. Only the results that pertain to hip and knee replacement are included in Table 2. Any results that do not address this aspect are excluded; any duplicate results are also excluded. Excluded results are presented in Supplementary Table 2.

Table 2

*Responses to ‘prompts’ from each chatbot that could not be aligned to any JLA priority.*
ChatGPT3	Bard	Bing
Biomechanics and Joint Kinematics	What are the long-term effects of total joint replacement surgery on patients' quality of life?	What is the effect of exposure to physically demanding activities on the risk of revision surgery?
Implant Infection Prevention	How can we improve the accuracy and predictability of surgical planning for total joint replacement?	How can we reduce the morbidity and mortality in patients undergoing lower limb arthroplasty surgery during the COVID-19 pandemic?
Biological Approaches to Enhance Healing	How can we reduce the risk of complications, such as infection and dislocation, after total joint replacement?	What is the optimal dose and timing of tranexamic acid (TXA) to prevent bleeding complications in lower limb orthopaedic surgery?
Patient-reported Outcome Measures (PROMs)	What are the risk factors for complications after total knee arthroplasty?	What are the best practices for infection prevention and control in lower limb orthopaedic surgery?
Emerging Technologies	What is the role of minimally invasive surgery in total knee arthroplasty?	How can we prevent and treat periprosthetic joint infections in lower limb arthroplasty?
Long-term Implant Performance in Young and Active Patients	How can we develop new surgical techniques that are less invasive and have a quicker recovery time?	How can we reduce the disparities and inequalities in access and outcomes of lower limb arthroplasty?
Periprosthetic Joint Infection (PJI)	How can we improve the accuracy of surgical planning and implant placement?	How can we improve the diagnosis and management of prosthetic joint-associated synovitis in lower limb arthroplasty?
Quality Assurance and Standardization	What are the long-term effects of total knee replacement on bone health?	How can we reduce the environmental impact and carbon footprint of lower limb arthroplasty?
What is the role of patient-specific instrumentation or computer-assisted navigation in improving the accuracy and precision of implant placement in lower limb arthroplasty?	How can we improve the accuracy and precision of total hip replacement surgery?	How can we implement and evaluate the effectiveness of new technologies and innovations in lower limb arthroplasty?
How does the use of robotics or robotic-assisted techniques impact outcomes in lower limb arthroplasty, including factors such as implant positioning, alignment, and functional outcomes?	How can we improve the design of orthopaedic implants?	How can we reduce the variation and improve the quality of care for lower limb arthroplasty across different settings and regions?
How can we better predict and manage complications such as infection, dislocation, implant loosening, and periprosthetic fractures in lower limb arthroplasty patients?	What are the best ways to manage complications after TKA, such as infection and implant loosening?	Another research question is how to reduce surgical site infection in arthroplasty of the lower limb.
What is the optimal timing for simultaneous bilateral lower limb arthroplasty, considering factors such as patient safety, recovery, and resource utilization?	Can we develop new materials and techniques for TKA that will improve patient outcomes?	-
-	Can we develop new surgical techniques that will make TKA less invasive and have a faster recovery time?	-
-	How can we improve the quality of life for people who have had knee arthroplasty?	-
-	What is the optimal surgical technique for total knee arthroplasty?	-

As illustrated in Fig. 1, among these Chatbots, ChatGPT3 identified 15(71%) priorities. Bard, nine (42%) priorities, while Bing identified eight (38%).

All three Chatbots generated research questions that were pertinent to hip and knee arthroplasty but did not align with the JLA priorities. ChatGPT3 identified 12 research questions that were relevant to our study. Bard, on the other hand, generated 38 research questions, but after removing duplicates and those not pertinent to our study, we were left with 15. Similarly, Bing yielded 14 research questions, and after eliminating duplicates and those unrelated to our study, we ended up with 11.

No Chatbot identified the five JLA research priorities listed below.

What is the best way to protect patients from the risk of thrombotic (blood clots, bleeding) events associated with hip/knee replacement?

What non-surgical treatments can reduce the need for hip/knee replacement?

What factors contribute to deterioration and post-operative stiffness following knee/hip joint replacement surgery?

Does attendance at a pre-operative 'hip/knee school' reduce the length of post-operative stay in people with osteoarthritis undergoing knee/hip joint replacement?

What causes and what is the best treatment for prolonged orthostatic hypotension in patients following hip/knee replacement surgery?

Although Bard possesses the capability to provide sources like Bing's referencing feature, it did not generate any sources (references) for our ‘prompts’. Moreover, Bard's responses to our subsequent ‘prompts’ did not identify with the JLA research priorities. Furthermore, when Bard was prompted to “Give me all the current unanswered research questions for Orthopaedic surgery specifically in lower limb arthroplasty?” responses were aimed towards TKA with no responses related to THA.

In our initial experiments, Bing, showed that when using ‘prompts’ and follow-up ‘prompts’ for the more creative style, we received responses for all ‘prompts’. However, when the same ‘prompts’ were used for the more balanced style, responses were generated until the last two follow-up ‘prompts’. Regarding the more precise style, it did not generate responses but provided references for further exploration. When we prompted Bing to “Give me all the current unanswered research questions for Orthopaedic surgery specifically in lower limb arthroplasty?”, we obtained responses from all three conversational styles. The links to the references provided are detailed in the supplementary material and examples are shown in the discussion.

Each of the three Chatbots were asked whether they had access to the JLA research priorities. ChatGPT3 explicitly stated that it did not have access to such information. On the other hand, Bard produced a response featuring the top 10 priorities, complete with source links that directed to the JLA website. Bing, with its two conversational styles (more precise and more creative), also presented the top 10 priorities. In addition, Bing made multiple references to the JLA website and provided a research article pertaining to priority setting for the JLA.

Among the Chatbots evaluated, ChatGPT3 demonstrated the best performance, with its results identified 15 (71%) research priorities. Despite its limitation of not being connected to the internet and having a knowledge cut-off in November 2021, ChatGPT3 showcased its ability to leverage its deep understanding of human language to provide insightful responses. These findings suggest that ChatGPT3 may be a valuable tool for researchers and healthcare professionals to provide a framework to generate important research questions. Nevertheless, it is crucial to acknowledge that this iteration of ChatGPT has constraints in terms of its knowledge base. It is also worth highlighting that when asked about having access to the JLA priorities, it explicitly denied having such access. Furthermore, more recent iterations like ChatGPT4, equipped with internet connectivity, might provide a broader range of information, however this is a subscription or paid-for service.

In contrast, Bard, developed by Google, had access to the internet but identified nine (43%) research priorities. Its performance was more focused on TKA, providing poorer responses regarding THA. This discrepancy highlights the variation in Chatbot performance based on the specific context or query, emphasising the need for further investigation to understand and improve these limitations. Additionally, Bard, despite being capable of providing sources for the information it presents, did not produce any. Interestingly, when prompted if it had access to the JLA priorities, it produced a response with the 10 priorities and links to the direct website to access these priorities.

It is important to note that Bard may have limited ability to recall previous conversations, as mentioned in the Bard FAQ.[27] However, the extent of Bard's memory capabilities remains uncertain, which could explain the misalignment with our research priorities and explain the lack of priorities that were aligned with the responses that Bard gave after initial prompting despite admitting access to the JLA priorities.

While Bing's answers were not as comprehensive as ChatGPT3 they offered credible and relevant references to our ‘prompts’. Examples include: A prospective study on enhanced recovery programmes[28], impact of COVID-19 on lower limb arthroplasty mortality and morbidity [29], a review of lower limb arthroplasty literature assessing publication bias [30], a review of literature regarding rheumatoid arthritis and lower limb arthroplasty. [31] However, the same references were used in response to our follow-up ‘prompts’. The findings indicate that Bing's responses may have limitations when generating novel information and offering new references. Nevertheless, the references provided by Bing offer researchers valuable additional resources, enabling them to augment their search and gain increased confidence in the obtained results. In addition to this, questions remain as to why there is a significant gap in the detail provided by Bing compared to ChatGPT3 despite being powered by the updated LLM version GPT4.

All three Chatbots generated responses related to research questions that were not mentioned in our comparator (JLA). ChatGPT3 specifically emphasised the necessity for further investigation in areas such as implant infection prevention and the utilisation of emerging technologies like AI. On the other hand, Bard also identified research questions, including the need to mitigate complications following arthroplasty. Bard produced responses were often generic and not directly relevant to the provided prompt or our research question. Similarly, Bing presented various responses that did not align with our identified research gaps. Although these responses were sourced from credible references, they failed to address the specific research priorities we were focusing on. However, it is crucial to acknowledge that JLA priorities undergo extensive filtering of research priorities in the early stages before publication. Therefore, while the responses generated by these Chatbots may not align with the final published versions, they might still contain valuable content that could have been utilised during the development stages.[13]

Despite the potential benefits of Chatbots, our analysis revealed that none of the Chatbots addressed five key priorities pertaining to hip and knee arthroplasty. The absence of attention to these key priorities raises concerns about the effectiveness and comprehensiveness of utilising Chatbots in the context of hip and knee arthroplasty research. It suggests that the current implementation of Chatbots may not fully embrace their potential to bridge gaps in knowledge or incorporate expert opinions in this specialised area. Despite this, it does demonstrate promising potential as a valuable adjunct to expert opinions and can be highly beneficial in facilitating the generation of a comprehensive list of ideas.

To the best of our knowledge, no previous research has explored the use of Chatbots for generating research ideas in the field of orthopaedic surgery. A recent study focused on assessing Chatbot LLM in their ability to identify research questions within the domain of gastroenterology.[32] This study prompted ChatGPT3 to generate responses related to four key research areas in gastroenterology. The responses were subsequently evaluated by a panel consisting of five individuals, including three gastroenterology consultants and two AI experts. Comparably, our comparator (JLA) represents a rigorous, multi-stakeholder process which is externally recognised. Nevertheless, the authors identified the potential of using ChatGPT to identify research gaps in other specialities.

In another study, researchers explored the capability ChatGPT3, to simulate a Google search that a potential patient might perform.[19] They then compared the answers generated by ChatGPT3 with the results obtained from an actual Google web search. By evaluating the responses produced by ChatGPT3 in comparison to those obtained from conventional search engines like Google, researchers have started assessing the potential of AI Chatbots to deliver pertinent and precise information in clinical contexts.

The JLA PSP require a significant number of resources, including reviewing systematic review and other studies with experts dedicating their time to this. By incorporating Chatbots, the process could potentially be accelerated, and resource utilisation reduced. A substantial portion of time is devoted to the initial scoping exercise, systematic reviews, and scoping surveys. This research suggests that automating this process could generate essential themes that can then be thoroughly discussed by an expert panel, ultimately leading to the formulation of specific research questions.

This study pioneers the utilisation of three Chatbots to identify research gaps within the realm of lower limb arthroplasty. While previous literature has explored the potential of Chatbots in scientific research, specifically in the context of paper writing and draft editing[10], our study uniquely explores their capacity as research idea generators. Other studies have been published on the utilisation of ChatGPT3 in medical school exams in the USA [33] and in Korea for general surgery board exams. [34] Further studies have shed light on the reliability of content generated by Chatbots when summarising medical literature reviews, highlighting the concerning possibility of Chatbots producing unfounded statements or even fabricated information. [35] Similar instances have been observed outside of healthcare, as exemplified by a recent court case where a legal representative resorted to the use of fabricated citations from ChatGPT3 in support of their argument.[36]

Strengths and Limitations

One of the key strengths of our paper is the comparator utilisation of JLA priorities, which distinguishes it from other studies that solely rely on expert opinions. By incorporating JLA priorities, we established research priorities prior to conducting our study.

The methodology presented in this study makes reproducing the results challenging due to the dynamic and iteratively changing nature of Chatbots and the potential for different answers to be generated by other users or at different times. Therefore, our results may not be reproducible.

We must also acknowledge the limited number of ‘prompts’ used to generate responses for each Chatbot and acknowledge that the ‘prompts’ that were used in this study may not directly be the most efficient or best ‘prompts’ to obtain a comprehensive idea of the gaps in research for hip and knee arthroplasty. Chatbot LLM draw from vast amounts of information, making consistent output difficult to ensure. Future studies or processes utilising Chatbot LLM for similar purposes should carefully consider these limitations and potential variations in results.

It is prudent to recognise the potential drawbacks associated with employing Chatbots for the purpose of generating research ideas. Chatbots rely on LLM that are trained using extensive data, sourced from both reliable and unreliable sources. The inherent challenge lies in the fact that LLM may lack the ability to differentiate between the reliability of these sources. Consequently, there is a risk of incorporating inaccurate or misleading information into the research ideas generated by Chatbots. Moreover, LLM have the potential to amplify existing biases present in the literature. Since they learn from the data they are trained on, which might contain inherent biases, the output produced by Chatbots could inadvertently reinforce or perpetuate these biases.[37] Additionally, the reasoning behind the outputs generated by Chatbots remains opaque. While Chatbots can generate responses and ideas, the internal workings and decision-making processes behind those outputs are not readily understandable.

This lack of transparency makes it challenging to critically evaluate or validate the ideas suggested by Chatbots. Enhancing the transparency and explainability of the Chatbots LLM can enable researchers to better understand the reasoning behind the generated ideas and evaluate their suitability for further investigation.[38]

This study is the first to assess the use of Chatbots in identifying research questions in orthopaedic surgery. Chatbots may represent a valuable adjunct to improve efficiency in various consensus processes (such as research priorities or core outcome set definition). Further research to establish its role in such processes and its ability to refine research priorities into study designs is required.

The authors declare no competing interests

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Acknowledgments:

Adenrele Komolafe MPharm MSc (Data Science)

For assisting with the technical expertise required to understand large language models and their use in Chatbots.

Author Information

All authors were involved in the write up and review of this paper.

AR, CK and FS conceptualised the study.

AR, CK, FS and IA were involved in the design of the study.

AR, SD,AW were involved in the data collection.

SD, IA, AW and TS were involved in performing the study analysis of the data.

TS, CK and were involved in the interpretation of the data.

Competing interest statement

No competing interest from any author.

Data availability statement

All data is available in the supplementary files and manuscript

Kunze, K.N., et al., What’s all the chatter about? The Bone & Joint Journal, 2023. 105-B(6): p. 587-589.
Dobrev, D., A Definition of Artificial Intelligence. arXiv pre-print server, 2012.
Bini, S.A., Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care? The Journal of Arthroplasty, 2018. 33(8): p. 2358-2361.
Hinterwimmer, F., et al., Prediction of complications and surgery duration in primary TKA with high accuracy using machine learning with arthroplasty-specific data. Knee Surgery, Sports Traumatology, Arthroscopy, 2023. 31(4): p. 1323-1333.
Haeberle, H.S., et al., Artificial Intelligence and Machine Learning in Lower Extremity Arthroplasty: A Review. The Journal of Arthroplasty, 2019. 34(10): p. 2201-2203.
Ramkumar, P.N., et al., Preoperative Prediction of Value Metrics and a Patient-Specific Payment Model for Primary Total Hip Arthroplasty: Development and Validation of a Deep Learning Model. The Journal of Arthroplasty, 2019. 34(10): p. 2228-2234.e1.
Young, T., et al., Recent Trends in Deep Learning Based Natural Language Processing [Review Article]. IEEE Computational Intelligence Magazine, 2018. 13(3): p. 55-75.
Vaswani, A., et al., Attention Is All You Need. arXiv pre-print server, 2017.
OpenAI. Introducing ChatGPT. 2022 [cited 2023 June]; Available from: https://openai.com/blog/chatgpt.
De Angelis, L., et al., ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health. Front Public Health, 2023. 11: p. 1166120.
Bahdanau, D., K. Cho, and Y. Bengio, Neural Machine Translation by Jointly Learning to Align and Translate. arXiv pre-print server, 2016.
Bornmann, L., R. Haunschild, and R. Mutz, Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanities and Social Sciences Communications, 2021. 8(1).
Mathews, J.A., et al., Top ten research priorities for problematic knee arthroplasty. The Bone & Joint Journal, 2020. 102-B(9): p. 1176-1182.
Partridge, N. and J. Scadding, The James Lind Alliance: patients and clinicians should jointly identify their priorities for clinical trials. The Lancet, 2004. 364(9449): p. 1923-1924.
Ben-Shlomo, Y., et al., The National Joint Registry 19th Annual Report 2022. 2022.
Microsoft. Bing 2023 [cited 2023 June]; Available from: https://www.microsoft.com/en-gb/bing?form=MW00X7&ef_id=_k_CjwKCAjwyqWkBhBMEiwAp2yUFhixzOTeYCDAFStLirJHlBCK2zkUcRFuVtguim-CheK1Z78rPbevJxoCVfMQAvD_BwE_k_&OCID=AIDcmm9rh5zl23_SEM__k_CjwKCAjwyqWkBhBMEiwAp2yUFhixzOTeYCDAFStLirJHlBCK2zkUcRFuVtguim-CheK1Z78rPbevJxoCVfMQAvD_BwE_k_&gclid=CjwKCAjwyqWkBhBMEiwAp2yUFhixzOTeYCDAFStLirJHlBCK2zkUcRFuVtguim-CheK1Z78rPbevJxoCVfMQAvD_BwE.
Google. Bard. 2023 [cited 2023 June]; Available from: https://bard.google.com/.
Gilson, A., et al., How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment. JMIR Med Educ, 2023. 9: p. e45312.
Dubin, J.A., et al., Using a Google Web Search Analysis to Assess the Utility of ChatGPT in Total Joint Arthroplasty. The Journal of Arthroplasty, 2023. 38(7): p. 1195-1202.
TheNewYorkTimes. The Brilliance and Weirdness of ChatGPT. 2022 [cited 2023 June]; Available from: https://www.nytimes.com/2022/12/05/technology/chatgpt-ai-twitter.html.
Tom, et al., Language Models are Few-Shot Learners. arXiv pre-print server, 2020.
Google. Introducing PaLM 2. 2023 [cited 2023 June]; Available from: https://blog.google/technology/ai/google-palm-2-ai-large-language-model/.
TechCrunch. Microsoft’s new Bing was using GPT-4 all along. 2023 [cited 2023 June]; Available from: https://techcrunch.com/2023/03/14/microsofts-new-bing-was-using-gpt-4-all-along/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAALTuFk1LM18baojOTAwNJz0DjUuuYZTT-rxOpWVokODXOh_WA9WWut0ODaKDy-JRsVZnX5NCdB_vC4nWcdvmZUVhFgnb0cwbOkI9tX1EzcKSgsyKIEkB62W9IK4uIFrV_-3UTdvAFFGl5GgN5AVC4yOJATgUAHd_5eWxNpqBkyg2.
StatCounter. Search Engine Market Share United Kingdom. 2023 [cited 2023 June]; Available from: https://gs.statcounter.com/search-engine-market-share/all/united-kingdom.
NIHR. NIHR James Lind Alliance Priority Setting Partnerships rolling call. 2022 [cited 2023 June]; Available from: https://www.nihr.ac.uk/documents/nihr-james-lind-alliance-priority-setting-partnerships-rolling-call/28569.
JamesLindAlliance. Hip & Knee Replacement for Osteoarthritis. 2014 [cited 2023 June]; Available from: https://www.jla.nihr.ac.uk/priority-setting-partnerships/hip-and-knee-replacement-for-osteoarthritis/top-10-priorities/.
Google. Bard FAQ. 2023 [cited 2023 June]; Available from: https://bard.google.com/faq#citation.
Tucker, A., et al., Orthopaedic Enhanced Recovery Programme for Elective Hip and Knee Arthroplasty - Could a Regional Programme be Beneficial? Ulster Med J, 2016. 85(2): p. 86-91.
Agrawal, Y., et al., Morbidity and mortality in patients undergoing lower limb arthroplasty surgery during the initial surge of the COVID-19 pandemic in the UK at a single-speciality orthopaedic hospital. Bone & Joint Open, 2021. 2: p. 323-329.
Delanois, R.E., et al., Hip and Knee Arthroplasty Orthopedic Literature in Medical Journals-Is It Negatively Biased? J Arthroplasty, 2018. 33(2): p. 615-619.
Clement, N.D., S.J. Breusch, and L.C. Biant, Lower limb joint replacement in rheumatoid arthritis. Journal of Orthopaedic Surgery and Research, 2012. 7(1): p. 27.
Lahat, A., et al., Evaluating the use of large language model in identifying top research questions in gastroenterology. Scientific Reports, 2023. 13(1).
Kung, T.H., et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health, 2023. 2(2): p. e0000198.
Oh, N., G.S. Choi, and W.Y. Lee, ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models. Ann Surg Treat Res, 2023. 104(5): p. 269-273.
Tang, L., et al., Evaluating Large Language Models on Medical Evidence Summarization. 2023, Cold Spring Harbor Laboratory.
CNN. Lawyer apologizes for fake court citations from ChatGPT. 2023 [cited 2023 June ]; Available from: https://amp.cnn.com/cnn/2023/05/27/business/chat-gpt-avianca-mata-lawyers/index.html.
OpenAI. Improving language understanding with unsupervised learning. 2018 [cited 2023 June]; Available from: https://openai.com/research/language-unsupervised.
Thapa, S. and S. Adhikari, ChatGPT, Bard, and Large Language Models for Biomedical Research: Opportunities and Pitfalls. Annals of Biomedical Engineering, 2023.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

No Time for ChitChat? Artificial intelligence Chatbots as a tool to identify research priorities in hip and knee arthroplasty.

Status:

Version 1

Abstract

Figures

Introduction

Aims

Methods

Ethics

Choice of AI Chatbots

James Lind Alliance Prioritisation (JLA)

Data collection

Data analysis

Results

Discussion

Strengths and Limitations

Conclusions

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1