Chatbot usage has gained widespread popularity in recent years, particularly for business and customer service purposes (Følstad & Brandtzæg, 2017; Taylor et al., 2020). In many fields, including healthcare, finance, and e-commerce, chatbots have become a favored tool for interacting with consumers. Chatbots employ natural language for engaging in effective interactions with users and providing responses to their inquiries. Currently, chatbots are utilized in various domains to offer round-the-clock services. Chatbots possess the ability to address a minimum of 80% of typical customer queries and are accessible round-the-clock, indicating their versatility beyond just being limited to handling inquiries (Adam et al., 2021). Primarily employed for sales-related activities, followed by support and marketing functions, chatbots have demonstrated a notable impact on increasing sales figures with an average improvement of 67% (Press, 2019). Additionally, one-fourth of all transactions involve interactions mediated through these automated conversational agents (Press, 2019). However, in order to fully leverage the potential of chatbots, it is crucial that their usability be evaluated systematically. In other words, the effectiveness and usability of these chatbots remain a subject of inquiry.
On the other hand, the evaluation of user satisfaction with chatbots has become an important area of research to understand how effectively such agents might meet the needs of users enabling a positive experience (Jenneboer et al., 2022). Evaluation scales are available and widely used to assess the interaction quality perceived by the users of a digital system, for instance, reliable and concise scales such as the System Usability Scale (SUS) (Brooke, 1996), the Usability Metric for User Experience (UMUX) (Finstad, 2010), and UMUX-Lite (Lewis et al., 2013). The SUS is a “quick-and-dirty” questionnaire that consists of ten items using a five-point scale, which has been shown to have excellent psychometric properties. In contrast, UMUX employs fewer items in line with the International Organization for Standardization (ISO) definition of usability and utilizes a seven-point scale ranging from strongly disagree to strongly agree. UMUX Lite is a two-item instrument that has acceptable reliability, validity, and psychometric properties, making it a valuable tool for preliminary and rapid testing of user reactions to a prototype (Borsci et al., 2015). Nevertheless, while these tools have been widely used and validated for assessing user satisfaction with web interfaces when it comes to chatbots these scales may not fully capture the conversational aspects of user interaction with chatbots since they were not designed for this purpose (Borsci, Malizia, et al., 2022). In a similar fashion, as suggested by Lewis (2016) and Lewis and Sauro (2020), scales to assess the experience with vocally controlled interfaces were developed in the past, for instance, the Mean Opinion Scale (Salza et al., 1996), the Subjective Assessment of Speech System Interfaces is composed by (Hone & Graham, 2000) and the Speech User Interface Service Quality developed by (Polkosky, 2005). Nevertheless, voice-controlled interfaces are not comparable to chatbot systems.
The development of a novel measurement scale, known as the Chatbot Usability Scale (BUS-11), is one example of recent efforts to create an evaluation tool that can be used specifically for gauging users’ satisfaction following interactions with chatbots. This newly devised scale has been validated and established as fit-for-purpose according to research conducted by Borsci et al. (2022). The BUS-11 composed in its last version of 11 items (Borsci, Schmettow, et al., 2022) was extensively validated and it is currently available and validated in various languages: Dutch, Italian, German, and Spanish. The researchers utilized a comprehensive approach to develop the construct of such a scale (Borsci, Schmettow, et al., 2022). Additionally, the researchers utilized the UMUX-Lite scale (Lewis et al., 2013) to establish convergent validity as a dependent variable. The outcomes of previous studies suggested a positive correlation between BUS-11 and UMUX-Lite (Borsci, Schmettow, et al., 2022). BUS-11 can aid practitioners in comparing and benchmarking their chatbots during product evaluation, allowing them to enhance their performance. The researchers also suggested that the construct and the factors behind the BUS scale (Borsci, Malizia, et al., 2022) can be also utilized as an efficient checklist for chatbot designers during the development process.
Turkey has become a desirable e-commerce market to invest in by Turkish companies and foreign partnerships due to its population of over 80 million and the accelerating e-commerce volume (Akıl & Ungan, 2021). Along with the e-commerce market, Turkey is also a potential growth market for chatbots, with many businesses incorporating chatbots into their operations. CBOT, the leading chatbot developer in the country, claims that 90% of customer inquiries can be handled by customized chatbots with appropriate task customization (İçgözü, 2020). However, it is essential to ensure the evaluation metrics used for chatbots are not only specific but also valid and reliable. It is currently not possible to find a validated chatbot usability scale in the Turkish language. Therefore, further research in this area is highly necessary to ensure the efficacy and comprehensiveness of any evaluation tool used for Turkish chatbots. The validation in Turkey of the BUS-11 to facilitate the assessment and the comparability of the quality of interaction with chatbots might help practitioners in Turkey to benchmark their chatbots within the country and at international level, this might potentially unify and ameliorate the user experience offered by these system and thereby improve the performance of chatbots. Therefore, this study aims to adapt the BUS-11 scale for Turkish language to assess its validity and reliability in evaluating chatbots in Turkey.