A Novel Approach to Voice of Customer Extraction using GPT-3.5 Turbo: Linking Advanced NLP and Lean Six Sigma 4.0

This research breaks new ground by utilizing the advanced natural language processing (NLP) capabilities of OpenAI's GPT-3.5 Turbo model for the extraction of Voice of Customer (VoC) data from online customer support interactions on Twitter. Traditional methods of VoC extraction have typically fallen short in capturing the richness and nuance of customer sentiment. Contemporary Machine Learning (ML) approaches, while improved, still struggle to interpret the contextual subtleties of digital customer communications effectively. This study showcases the innovative deployment of GPT-3.5 Turbo, demonstrating its superior performance in extracting VoC through a deeper understanding of conversational context and a more intuitive, chat-based data processing. Furthermore, the large-scale, multilingual processing capabilities of this model offer a more comprehensive and inclusive analysis of VoC. The study ties these advancements to Lean Six Sigma 4.0, illustrating how the integration of GPT-3.5 Turbo's transformative capabilities can elevate the customer-centric approach of Lean Six Sigma in the era of Industry 4.0. This innovative exploration points to a signi�cant evolution in VoC analysis, offering potential for more insightful, real-time data-driven customer service strategies and a more substantial foundation for decision-making in product development and process improvement. Future research is encouraged to validate these preliminary �ndings and to investigate ethical considerations associated with the use of such advanced NLP models.


Introduction
The high levels of competition prevalent in all sectors of the economy have led service providers to implement different models to gain insights into customers' needs and requirements.One way of contributing to the model-building process is through the voice of the customer (VOC), a source of service-related data resulting from the proactive role played by customers who increasingly interact with companies differently.Businesses recognize the value of client feedback as a signi cant learning resource.Unsolicited comments written by customers in their own words are deemed information-rich, full of dynamic evaluations of the service experienced, and have a low extent of response bias.Insights on customer contentment or discontent are vital for enhancing the effectiveness and performance of offerings [1].However, acquiring such information can be costly, often involving interviews, surveys, and market research [2].Recently, there has been an increasing focus on determining customer requirements more effectively and impartially [3].Digital innovations aid in addressing this challenge by creating datadriven methodologies [4].In the past, analyzing word-of-mouth was a primary method for quality management.Nowadays, word-of-mouth has transitioned from its tangible and interpersonal nature to a digital form [5].
Web 2.0 broadly refers to the second stage of internet growth and proliferation, marked by a signi cant increase in user interaction with websites.This development includes higher user involvement, often as content creators (e.g., through blogs, chats, forums, and wikis), enhanced information-sharing capabilities, easier information retrieval and exchange using peer-to-peer tools or multimedia content distribution systems, and the rise of social networks [6].Customers can now express their thoughts on products and services through online forums, blogs, and platforms, generating the so-called digital VoC [7].Deciphering the reasons behind customer satisfaction or dissatisfaction is an ongoing challenge for companies across various sectors, regions, and markets.However, the bene ts of digital VoC are often solely utilized by digital platform operators, leaving manufacturers and service providers without proper tools to analyze and harness the information [8].As a result, digital feedback from customers has emerged as a viable alternative to traditional methods for gathering feedback information given by customers.

Arti cial Intelligence and Natutal Language Processing
Machine Learning (ML) is a subset of Arti cial Intelligence (AI) that focuses on developing computer algorithms that can learn and improve from experience.It uses statistical methods to enable machines to improve with experience [9].Deep Learning (DL) is a subset of ML that involves algorithms inspired by the structure and function of the brain called Arti cial Neural Networks (ANN) [10].DL can process a broad range of data resources, requires less data preprocessing by humans, and often produces more accurate results than traditional ML approaches [11].AI is a general term encompassing various technologies capable of simulating human intelligence.It represents the capacity of computers or machines to think and learn, aiming to create "smart" systems that can function independently without relying on explicit commands.John McCarthy coined the term in 1955 [12].AI encompasses an array of technologies such as ML, DL, inference algorithms, Natural Language Processing (NLP), Neural Networks (NN), and computer vision [13].NLP is a eld of AI that enables computers to understand and interpret the real world human to human communication via text.It involves acquiring, processing, analyzing, and understanding texts from the real world to produce numeric or symbolic information.It utilizes an array of algorithms that mimic the human communications abilities.Fig. 1 shows the relationship between ML, DL, AI, and NLP.
Over the last two decades, Data Mining (DM) research has advanced signi cantly, creating new opportunities in customer feedback analysis.Methods and technologies previously available to a select few are now more accessible and user-friendly [14].Topic modeling is the most widely used text-mining technique.Applying topic modeling algorithms to large sets of text data enables the extraction and identi cation of VoC, which are the essential characteristics of products or services that profoundly impact customer satisfaction [15].DM and ML methods enable the analysis of vast digital text datasets to extract the most pertinent information, circumventing the impracticality of human reading and interpretation.
VoC can be extracted using ML through several methodologies, including sentiment analysis, topic modeling, and text classi cation, among others.In sentiment analysis, ML algorithms, often based on NLP, are trained to identify and categorize opinions expressed in a piece of text, especially to determine whether the writer's attitude is positive, negative, or neutral.Topic modeling, like Latent Dirichlet Allocation (LDA), helps discover the abstract "topics" that occur in a collection of documents.Text classi cation can categorize customer feedback into prede ned classes.Extracting VoC through ChatGPT-3.5 engine, however, presents a novel approach due to its enhanced conversational abilities and context comprehension.Traditional ML models can lack the ability to understand the context of a conversation or piece of text fully, but the engine behind ChatGPT-3.5, with its transformer architecture, can capture the dependencies of a word with all other words in a text, no matter how far apart they are [16].Moreover, ChatGPT-3.5 has been trained on a diverse range of internet text, so it can generate creative and coherent responses, making it more adept at capturing the nuanced sentiments expressed in customer interactions.It also enables multi-turn conversations that provide a more interactive and iterative mode of extracting customer feedback, giving a richer and more accurate VoC data.Lastly, ChatGPT-3.5'sne-tuning capabilities allow it to adapt to speci c tasks, such as customer service scenarios.This makes GPT-3.5 not only a powerful tool for VoC extraction but also a cost-effective solution.
Moreover and speci cally in the eld of extracting VoC from online text data, there have been several methodologies and ML techniques employed.However, many traditional models have struggled to fully understand the context of a conversation or piece of text, often missing nuanced meanings or sentiments, thus failing to accurately represent the customer's voice.ChatGPT3.5 engine, on the other hand, is a powerful language model developed by OpenAI that has been ne-tuned on a diverse range of internet text.It offers signi cant potential in addressing this research gap.Firstly, due to its transformerbased architecture, it has the ability to understand the context of a conversation more fully than previous models [17], thus capturing more accurate insights about customer sentiments and intentions [18].Secondly, the chat-based nature of GPT3.5 engine allows it to extract information from customer conversations in a more natural and intuitive manner, giving it the potential to better capture the true VoC.And its large scale of GPT3.5 engine means it can process a vast quantity of data in many languages, thus providing a more comprehensive and inclusive representation of the VoC.This paper explores the utilization of the engine behind ChatGPT3.5 to extract VoC from an online customer consersation dataset.In recent decades, heightened competition has become a prominent feature across various sectors.Implementing big data analytics and arti cial intelligence has improved organizational performance [19].

VoC and Lean
To maintain competitiveness, businesses must cater to evolving customer demands and deliver superior products at competitive rates.However, the intricacies of operations have given rise to numerous challenges for industries.The key to staying competitive lies in enhancing productivity through optimal resource utilization and minimizing waste and defects in products and processes [20].As such, Lean Six Sigma (LSS) has emerged as an integral approach for addressing dynamic customer requirements [21].
In today's globalized environment, industries constantly strive to re ne their processes [22,23].Fluctuating customer expectations for high-quality products at reasonable prices and within short timeframes have compelled enterprises to adopt cutting-edge tools and state-of-the-art manufacturing systems.After three notable revolutionary phases, the manufacturing sector is undergoing the fourth industrial revolution or Industry 4.0 (I4.0) [15], demonstrating remarkable advantages in nancial and operational performance.This revolution involves the use of technology such as the Internet of Things (IoT), Cyber-Physical Systems (CPS) [24], Cloud computing, Big data analytics, Augmented reality, and more [25].Academic research indicates that LSS4.0 increases customer satisfaction, superior quality, lower costs, quicker delivery, and other bene ts [26], [27].Adopting LSS4.0 equips industries with a competitive advantage, enabling them to thrive in the marketplace.LSS4.0 was created by merging the concepts of Lean, Six Sigma, and I4.0.Lean Six Sigma is an approach aimed at enhancing the productivity and quality of processes.I4.0 incorporates sophisticated technologies like IoT, AI, and automation within manufacturing processes.By fusing Lean Six Sigma and Industry 4.0, the objective is to attain heightened e ciency and excellence in manufacturing processes by applying these advanced technologies.Such integration has the potential to minimize waste, boost productivity, and elevate overall performance in operations [28,29].

VoC
The initial stage in the LSS enhancement process is the De ne phase.During this stage, the project team creates a Project Charter, constructs a high-level process map, and starts examining the process customers' needs.This crucial phase helps the group establish the project's focus for themselves and the organization's leadership.The De ne phase tools allow managers to capture the VoC.In LSS, VoC refers to the customer's input, expectations, preferences, and feedback regarding a product or service under discussion.It represents the customer's statement about a speci c product or service [30].Customers who purchase or use your products/services and receive the process output can be classi ed into internal and external customers.Internal customers are part of the organization, including management, employees, or any functional department.In contrast, external Customers are not a liated with the organization and may be clients, end-users, shareholders, or other stakeholders.Historically, VoC has been associated with customer dissatisfaction, service failure management, and complaint resolution [31].Dissatisfaction and service failures are viewed as factors that enable VoC [32].Although this was popular between the 1980s and 1990s, it has faced criticism for treating customer satisfaction as a post-service experience outcome, contrasting with the Service-Dominant Logic (SDL) perspective, which claims that customer satisfaction is co-created through the interaction between the customer and service provider throughout the entire process, not solely afterward [33].Understanding VoC leads to understanding the value of the customer, which is the rst Lean principle [34].Understanding helps guide the strategy of any enterprise.This is done by identifying critical measures and factors that contributed to success or failure.Thus, allowing for a exible response to customer needs and providing a real competitive edge over other competitors in the market.Furthermore, gaining insights on how value is perceived by customers and managing their expectations help improve the quality of services [35].

Customer Needs and Requirements
A need is a customer's desire or expectation from a speci c product or service.Customers may have numerous stated needs, often ambiguous and typically regarded as "wants" for a product/service.For instance, a customer may require an air conditioner for their bedroom.They need a cooler temperature in the bedroom, while their wants include quiet operation, cost-effectiveness, and low maintenance [36].When customers state their requirements, the project team must understand and differentiate between needs and wants.The primary reason for distinguishing needs from wants is that needs are crucial features, whereas wants are expectations beyond those needs.If a product/service fails to meet customers' wants, they may be highly dissatis ed.However, if it does not ful ll a customer's needs, they will not use the product/service and will likely switch to a competitor's offering.The organization's reputation may also be at risk if needs are unmet [37].A requirement is a product or service characteristic that satis es a customer's need.Customers de ne these requirements, which are essential for a product or service.For example, in the air conditioner scenario, the customer's requirement is "cool temperature," while the other features are "nice to have."The customer may not purchase the air conditioner if it has all the "nice-to-have" features but fails to meet the requirement.Conversely, a customer may buy the product/service if it meets the requirement and has or lacks the "nice-to-have" features [38].

Capturing VoC
Prior to the advent of ML, these traditional methods for extracting VoC typically involved direct channels such as surveys, customer interviews, focus groups, feedback forms, and comment cards.VoC approach identi es existing (expressed needs) and hidden (unexpressed needs) customer requirements.This method allows for the collection of customer input through direct statements (customer voices) and the translation of these statements into customer needs, which are then linked to product or service output features (customer requirements) [39].Table 1 summarizes the techniques used to generate VoC [40].
Table 1.List of methods deployed to generate VoC.

Technique
De nition

Surveys
These involve distributing a structured questionnaire to potential or current customers.While cost-effective, surveys typically have a low response rate.

Interviews
Individual meetings with potential or existing customers are conducted to ask questions and discuss responses to gain insight into customer perspectives.Interviews can address complex issues but necessitate skilled personnel.

Focus Groups
A group of individuals convene in a conference room to discuss speci c topics of interest.Focus groups excel at identifying Critical to Quality (CTQ) aspects, but their ndings can be challenging to generalize.

Suggestions
Client, customer, or employee feedback is received as product or service improvement recommendations.While suggestions offer valuable opportunities for enhancement, they may not encompass the entire process.
Observations Individuals may provide feedback based on their observations during the process, which can serve as a form of Voice of the Customer Digital Any comments or feedback provided by customers in a digital format Businesses also gleaned insights indirectly through mystery shopping, customer reviews, and complaint analysis.However, these methods had several limitations.Table 2 summarizes the disadvantages of using these traditional method and the advantages of using ChatGPT3.5.

Advantages of ChatGPT3.5 Engine
Limited Scale and Depth: Traditional methods are often time-consuming, expensive, and limited in the amount of feedback they can process.They may fail to capture the full breadth and depth of the customer's experience, especially in the case of written feedback, where nuances might be lost.
Scalability: GPT-3.5 can analyze massive volumes of data quickly, providing businesses with the ability to process and understand feedback at scale.
Subjectivity and Bias: Manual analysis of customer feedback is prone to subjectivity and bias, which could distort the interpretation of the data.
Deep Context Understanding: GPT-3.5'sability to comprehend context and semantics can help extract deeper insights, including subtle sentiments and nuanced opinions that traditional methods may miss.
Lack of Real-Time Analysis: Traditional methods do not typically provide real-time insights, which are increasingly crucial in today's fast-paced business environment.
Real-Time Insights: As an AI model, GPT-3.5 can provide real-time analysis of customer feedback, helping businesses respond more promptly to customer needs and market trends.
Di culty in Capturing Nuanced Sentiments: Sentiments in customer feedback are often complex and multidimensional, making them hard to capture accurately through traditional methods.
Objectivity: GPT-3.5 can analyze customer feedback objectively, reducing the risk of human bias.
In addition, with its e cient processing, GPT-3.5 could also be a more cost-effective solution for VoC analysis compared to traditional manual methods in the long run.Fig. 2 shows a shbone diagram that summarizes the advantages of ChatGPT3.5 over traditional VoC extraction methods.
3. Large Language Models

Background
As the industry experienced its fourth industrial revolution, quality management research also transformed the new Quality 4.0 paradigm [41].The digitization of businesses presents unique opportunities for managing product and service quality.Consequently, there is an increasing awareness of the value of user-generated data [3].Numerous studies demonstrate how online customer feedback, especially online reviews, can be utilized to understand consumer preferences and hidden quality aspects [15].Online customer feedback is a valuable source of customer needs, and machine learning techniques are likely to be more effective and e cient than traditional methods [7].Historically, identifying factors in uencing quality perception relied on quantitative methods, primarily using data from questionnaires and interviews [42].While well-established, these methods can be resource-intensive regarding personnel and time.The quality of insights obtained from these methodologies hinges on the respondent's willingness to participate and the questionnaire's complexity.Furthermore, questionnaires have several limitations, including restricted sample size, expert bias in item selection, and potential response errors [43].An alternative approach for identifying hidden quality factors of a product or service involves analyzing online customer feedback, speci cally online reviews.These reviews provide a costeffective, unbiased, and reliable source for understanding customer opinions, expectations, and needs.This identi cation process is based on an in-depth analysis of such data, using data analysis tools capable of extracting information from text documents written in natural language [44].The underlying principle of these techniques is that if a product or service feature is discussed within online feedback, it is crucial for determining the quality of the item under investigation.Most previous studies using data analysis tools to examine online customer feedback concentrated on keyword frequency and sentiment analysis [45].A few researchers have applied topic modeling to identify quality factors [46].

The Rise of Large Language Models
Large Language Models (LLM) is a type of AI that applies a Deep Learning (DL) approach on large datasets to understand, summarize, generate, and gain new insights and information [47].Newly developed LLMs like OpenAI's Generative Pretrained Transformer 4 (GPT-4) employed in ChatGPT and Google's BERT utilized in Bard are transforming the traditional search engine paradigm.LLMs are deep neural network models trained on vast amounts of information, including books, code, articles, and websites, to grasp the underlying patterns and relationships in the language they are trained on [48].Consequently, these models can generate coherent content, such as linguistically accurate sentences and paragraphs that resemble human language or structurally sound code snippets [49].LLMs have numerous applications, including language translation, summarization, and question answering, and hold potential across various elds, provided the training data offers suitable input [50].Although LLMgenerated content is typically grammatically accurate, it may not always be semantically correct.For example, the probabilistic and random choice of the "next token" while constructing outputs may impress the end user with the appearance of accuracy and style, but it can also result in errors [51].The emergence of accessible open-source LLMs has dramatically transformed the natural language processing landscape, enabling researchers, developers, and businesses to harness these models' capabilities to create scalable solutions without incurring costs.One notable instance is Bloom, a pioneering multilingual LLM developed with full transparency through an unprecedented collaborative effort among numerous AI researchers [52].Surpassing OpenAI's GPT-3 with its 176 billion parameters, Bloom is pro cient in 46 natural languages and 13 programming languages.It has been trained on an immense 1.6TB of textual data, equivalent to 320 times the entirety of Shakespeare's works [53].

ChatGPT
ChatGPT is an advanced LLM created by OpenAI, an AI research and deployment organization, and was launched as a free research preview on November 30th, 2022.The goal was to gather user feedback and identify the system's strengths and weaknesses [54].While earlier language models could perform various NLP tasks, ChatGPT stands out as an AI chatbot designed for engaging, human-like conversations [55].Over ve days after its release, over one million users tried this tool to address detailed inquiries and produce brief texts.In the past ve years, the growth of LLMs has been remarkable, with their capabilities expanding across a wide array of tasks.Before 2017, most NLP models were speci cally designed for individual studies using supervised learning [56].The introduction of the selfattention network architecture, or Transformer, in 2017 [57] led to the development of two groundbreaking models in 2018: Bidirectional Encoder Representations from Transformers (BERT) [17] and Generative Pretrained Transformer (GPT) [58].Both models attained exceptional generalization abilities due to their semi-supervised approach, which combined unsupervised pre-training with supervised ne-tuning for downstream tasks.The GPT models have evolved quickly, with each iteration expanding its training data corpus and number of parameters.GPT-3, with 175 billion parameters, is 100 times larger than GPT-2 and has about twice the number of neurons in the human brain [59].As LLM evolution continues, the release of GPT-4 on March 14th, 2023 marks another milestone in rapid advancements [60].Already integrated into ChatGPT, GPT-4 appears more reliable, creative, and capable of handling nuanced instructions [61].This paper aims to discuss ChatGPT's role in generating VoC information.Fig. 3 shows an illustration of how ChatGPT engine process information.

The Dataset
The Twitter Customer Support [62] dataset built with PointScrape [63] is a comprehensive, contemporary collection of three million tweets and responses from more than 20 global brands to foster progress in natural language understanding and conversational models while examining current customer support practices and outcomes.Despite natural language's richness in conveying human experiences, the datasets fueling innovation often don't re ect contemporary language usage.This Twitter dataset provides a substantial volume of everyday English conversations (primarily) between customers and support agents on the platform and offers three key bene ts over other conversational text datasets.The rst key is that it resembles a purpose-driven dataset where customers reach out to support with the intent to resolve speci c issues, resulting in a narrower range of discussion topics compared to unrestricted conversational datasets like the Reddit Corpus.The second key bene t is that it has authenticity in the broader demographic representation of the users than those in the Ubuntu Dialogue Corpus and exhibits more recent and natural typed text usage than the Cornell Movie Dialogs Corpus.The nal key is concise due to Twitter's character constraints that encourage support agents to provide more genuine, non-scripted responses and prompt problem descriptions and solutions.Additionally, the platform's message length restrictions are ideal for recurrent neural networks.Figure 4 shows a breakdown of the number of discussion threads for these brands.
The dataset is formatted as a CSV le, with each row representing a tweet.Various columns provide additional information.All included conversations have a minimum of one customer inquiry and one company response.The inbound eld can be used to determine which user IDs are associated with company accounts.Table 3 shows a detailed description of the elds found in the dataset.Table 4 shows a sample of some of the initial formats of the Apple tweets that were taken from the dataset.Utilizing GPT-3.5'sadvanced language processing abilities, users can develop chatbots that engage effortlessly with users for diverse purposes, such as addressing inquiries, crafting narratives, managing nances, and offering therapeutic support.The API's potential applications are restricted only by human creativity, and it is thrilling to witness how developers will keep expanding the horizons of AI's capabilities.The recently introduced "ChatGPT" API is known as Chat Completion.By employing the OpenAI Chat API with GPT-3.5-Turbo and GPT-4, you can develop customized applications to accomplish tasks like composing emails or other written content, generating Python code, responding to queries about a collection of documents, developing interactive agents, implementing a natural language interface for your software, providing tutoring in various subjects, translating between languages, simulating characters for video games, and much more.Conversation models process a sequence of messages as input and produce a message generated by the model as output.While the chat structure is tailored to facilitate multi-exchange interactions, it is equally valuable for single-exchange tasks that don't involve conversation in an instruction-following model format [64].
The primary input consists of the messages parameter, which should be an array of message objects.Each object has a role (either "system", "user", or "assistant") and content (the message's content).Conversations can range from just one message to several pages long.Usually, a conversation starts with a system message, followed by alternating user and assistant messages.The system message helps guide the assistant's behavior, while user messages provide instructions.These can come from an application's end-users or be set by a developer.Assistant messages store previous responses and can be used by developers to demonstrate desired behavior.Language models process text in units called tokens.In English, a token can be as short as one character or as long as one word (e.g., a or apple).In other languages, tokens may be shorter or longer.For example, the string "ChatGPT is great!" consists of six tokens: ["Chat", "G", "PT", " is", " great", "!"].The total tokens in an API call affect the cost (as you pay per token), and time (since generating more tokens takes longer), and must be below the model's maximum limit (4096 tokens for GPT-3.5).Both input and output tokens count toward these quantities.Chat models like GPT-3.5-turbo and GPT-4 utilize tokens similarly to others, but the message-based format makes it harder to measure tokens in a conversation.Each message passed to the API uses the number of tokens plus additional tokens for internal formatting.If a conversation exceeds the model's maximum token limit, you'll need to truncate, omit, or reduce your text to t. Remember that removing a text from the message's input causes the model to lose all knowledge contained in that text.Best practices for guiding models may vary between GPT 3. include making instructions more explicit, specifying the desired answer format, or even asking the model to think through steps or debate the pros and cons before responding.Prompt engineering is essential for developers working with AI language models, as it helps ensure that the generated output aligns with user expectations and requirements.For example, many conversations start with a system prompt for gentle guidance.If the model doesn't produce the desired output, experiment and iterate with potential improvements by making your instruction more explicit, specifying the desired answer format, or asking the model to think step by step or debate the pros and cons before providing an answer [66].

Text Preprocessing
Preprocessing of a text dataset is a crucial step in any NLP task.It transforms raw text data into an understandable format for NLP models and algorithms.Since text data can be unstructured, noisy, and ambiguous, preprocessing helps to clean and organize the data so that algorithms can easily interpret it.The process starts with ltering the DataFrame only to include inbound tweets directed to AppleSupport.This is done by checking if the text of each tweet begins with "@AppleSupport".This is a part of the data preprocessing phase when working with pandas DataFrame in Python.This step is crucial for focusing on speci c subsets of data based on certain conditions or criteria.Filtering is essential because it allows you to focus on particular parts of your dataset that are relevant to the analysis or model you're working on.It helps remove unimportant or irrelevant data, reduce the computational cost, and increase your results' accuracy.The text data cleaning continues for further processing or analysis through a series of steps to transform the raw text into a more digestible form, making it easier for ML models to learn from.The cleaning process involved removing URLs, Twitter usernames, memorable characters, and stop words, converting all text to lowercase, performing lemmatization and stemming, and handling emojis by replacing them with their textual description using the emoji library.The pd.to_datetime() function in pandas converts the 'created_at' eld to datetime format.Then the text under 'author_id' eld was grouped, and the tweets within each group was sorted based on their data.This is to ensure that the tweets are in chronological order for each user and to select the rst tweet from each group as the starting point of each conversation.Multiple libraries were utilized in Python for various tasks related to preprocessing text datasets.Table 5 shows a summary of Python functions that were utilized in this process and what was their function in the preprocessing steps.matplotlib This is a plotting library used for data visualization, which is a critical aspect of preprocessing steps as it helps understand the distribution and characteristics of the data.
json JSON data is quite common in web data, and the json library allows Python to read JSON les into a format that can be manipulated as Python objects.This is useful in preprocessing, as many text datasets might come in JSON format.
regex Text data often needs cleaning and formatting.The re-library allows for advanced string manipulation using regular expressions.This can be used to remove unwanted characters, extract speci c information, and perform many other string preprocessing tasks.
nltk The Natural Language Toolkit library is used for many natural language processing tasks.For example, it includes capabilities for tokenizing text (breaking the text up into words or other meaningful components), identifying parts of speech, stemming and lemmatization (reducing words to their root form), and much more.
emoji This library helps in handling and manipulating emojis in text data.For example, in text preprocessing, it's often helpful to convert emojis to their meaning in words or remove them, which can be e ciently done using this library.
contractions This library handles contractions in English text, crucial in text preprocessing steps as it expands contracted words to their complete form.For instance, converting "don't" to "do not".This can help maintain consistency in the text data and improve the performance of downstream tasks like text classi cation or sentiment analysis.
The Natural Language Toolkit (NLTK) facilitates text preprocessing and analysis using resources such as Stopwords, Wordnet, and Punkt.In natural language processing, "stop words" are commonly used words that are often ltered out because they carry little meaningful information.Examples include "the", "and", "is", etc. NLTK provides a list of such words in multiple languages.By removing these words, you can focus on the essential words instead.WordNet is a lexical database of English words.It can be used to nd the meanings of words, synonyms, antonyms, and more.It's also useful for semantic understanding tasks.For example, it helps with "lemmatization" -reducing in ected (or sometimes derived) words to their word stem, base, or root form.WordNet's structure makes it a valuable tool for computational linguistics and NLP.Punkt is a resource used in tokenization (breaking up text into words, phrases, symbols, or other meaningful elements, known as tokens).The Punkt tokenizer is an unsupervised ML tokenizer pre-trained to know where to split text to form sentences.It's bene cial when you have large chunks of text to break down into sentences.These resources are often used together in text preprocessing.For instance, you might tokenize your text into sentences, then into words, remove stop words, and apply lemmatization using WordNet.The expand_contractions function was used in text preprocessing to convert contracted words into their complete forms.A contraction is a shortened version of a word or multiple words where an apostrophe replaces the missing letter(s).Examples include "don't" for "do not", "it's" for "it is", or "I'm" for "I am".Expanding contractions can be important for several reasons, including standardization, disambiguation, and improving text understanding.Text data often comes from a variety of sources and in various formats.By expanding contractions, you can ensure your text data is more standardized, which can help improve the performance of your subsequent analysis or machine learning model.Some contractions can have more than one meaning depending on the context (for example, "it's" could mean "it is" or "it has").Expanding these contractions can help disambiguate their meaning.Some NLP tasks or models may perform better with total words rather than contractions.For instance, speci c sentiment analysis tools might better understand that "is not" is a negation while "isn't" could be overlooked.The expand_contractions function helps automate this process of turning contractions into full words, improving the quality and consistency of your text data for further processing or analysis.

Processing Conversations
The extract_conversation function was used to pull out individual back-and-forth dialogues between different users or participants.The function was applied to each user's speci c tweets to extract all conversations and save them in a JSON le.Then the function convert_conversation was used to format each conversation for input to the GPT-3.5-turbomodel.It starts by appending the system message to the formatted_conversation list.Then it iterates over the conversation list, assigning the role "user" or "assistant" to each message based on its position in the list.For each conversation in the list of formatted conversations, a ChatCompletion request to the GPT-3.5-turbomodel was created.This request includes the formatted conversation as input.A ChatCompletion request is a type of API call that you can make to OpenAI's GPT-3.5-turbomodel.It is used to create interactive and dynamic conversations with the AI model.In this kind of request, you send a series of messages as your input, and the model returns a generated message as the output.Each message in your input has a 'role', which can be 'system', 'user', or 'assistant', and 'content', which is the actual text of the message.The model's response is then extracted and appended to a results list.The API call is made by initializing the OpenAI API using the necessary API key.This step is required to set up and authenticate your connection with the OpenAI API, which enables your program to interact with OpenAI's services, such as GPT-3.5-turbo.OpenAI uses API keys as a method to authenticate requests made to its services.An API key is like a password that helps OpenAI identify who is making the request.It's often recommended to store your API key in an environment variable or a secure, encrypted le rather than including it directly in your code.The response list was converted into a DataFrame and stored to an Excel le.Each response was split into customer need and customer requirements columns based on the newline character.Figure 5 shows a summary of this section.

Prompt Engineering
The "prompt engineering" technique provided clear instructions to the model in the system message.This is done by giving examples of how to map customer comments to customer needs and requirements.In the code, the system message instructs the model with the following prompt shown in Table 6.Customer Need: <Your response about the customer need from customer comment> Customer Requirement: <Your response about the customer requirement from customer comment>" The model was also fed with examples to guide its understanding of the task.This can be seen in the following segment of code shown in Table 7. Figure 6 shows a summary of this section.
Table 7 Example of conversation fed to the model.messages_list = [ {"role": "system", "content": "I want to map Customer Comments(What are the customers saying?) to Customer Needs(What do the customers need?) and Customer Requirements(What is required to ful ll the customer's need?)I can provide you with a couple of examples: \ 1): \ Customer comment: The product is too complicated, and I don't know how to use it.\ Customer Need: Customers need instructions on how to use the product.\ Customer Requirements: Videos, help center, articles, and webinars that will educate them on how to use the product.\ ... If I input a message or a conversation between a customer and some customer support agent, you need to provide: \ Customer issues: <Your response about the customer need from customer comment> \n\n Customer Requirement: <Your response about the customer requirement from customer comment>\n"} ]

Results and Discussion
This work utilized LLM, such as GPT3.5, to generate VoC based on customer conservations.The Apple Support set of tweets contains over 7000 + conversations.All of these tweets have been processed and are intended to be published online soon, but due to space constraints in paper publication, only a sample of the work is presented in Table 8 below.The customer is experiencing excessive data usage on their iOS device after installing an update and is unable to identify the cause The customer requires assistance in identifying the root cause of the excessive data usage and a solution to address the issue VoC analysis provides a feedback mechanism that can guide improvement efforts.By understanding customers' needs and wants, businesses can make informed decisions about product development, marketing strategies, and more.For example, requirement #1 indicates that developers at Apple might want to start thinking of how they can make home sharing more manageable and more global.VoC provides valuable insights into what customers like and dislike about speci c products, services, or overall customer experience.At the same time, requirement #2 points to the need to provide an uninterrupted app experience so that customers can feel comfortable using their phones and trust their phone's reliability.VoC helps address customer concerns and improving based on their feedback can lead to a stronger brand reputation, potentially attracting new customers and retaining existing ones.
Requirements #3, #5, and #6 fall under the theme of improving the device's current software and/or hardware.By addressing customer pain points and improving their experience, VoC analysis can help reduce customer churn.Requirement #4 suggests that developers might want to start increasing the functionality of their hardware by providing more services and tools to their users.VoC analysis can help identify emerging trends, allowing businesses to stay ahead of the market by innovating or adapting their offerings based on customer preferences and expectations.While requirement #7 points to a potential malfunction (bug) or a security threat causing connectivity issues.Businesses can improve customer satisfaction by actively listening and responding to VoC, increasing customer loyalty and retention.
Happy and satis ed customers are more likely to make repeat purchases and recommend the business to others, driving revenue growth.VoC analysis is vital for companies to understand their customers better, improve their offerings, and ultimately succeed in today's customer-centric market environment.
While the GPT-3.5-turbomodel presents a signi cant advancement in realm of NLPs, it has limitations when generating VoC insights based on online tweets.As the data shape the model's understanding of language and the context it's trained in, exposing it to new, diverse, and relevant datasets can help it adapt to evolving language use, slang, cultural references, and trending topics.This would improve its ability to accurately understand and generate responses to the most recent and relevant customer sentiments expressed in online tweets.Additionally, the model's reliance on historical data can limit its effectiveness in identifying emerging trends or changes in customer sentiment.Furthermore, GPT-3.5turbo can not verify real-time data or access any data post its training cut-off, leading to potential gaps in understanding customer sentiment.Lastly, issues such as the model generating inappropriate content or demonstrating biases in the training data can also pose signi cant limitations.Figure 7 shows how ChatGPT3.5 engine limitations in extracting VoC can be addressed.LSS, as a methodology, is rooted in eliminating waste, reducing process variability, and improving customer satisfaction.It heavily relies on VoC to identify areas of improvement, set the right quality standards, and ensure a customer-focused approach.The innovation brought by GPT-3.5 engine in extracting VoC is particularly potent when incorporated within the LSS4.0 framework.In the context of our discussion, GPT-3.5 becomes an enabler, enhancing LSS4.0 ability to understand the customer's voice more accurately.The advanced NLP capabilities of the model provide a more comprehensive and insightful analysis of customer sentiment, helping businesses to discern their customers' unmet needs and expectations.These insights can then feed directly into the De ne and Measure stages of the LSS DMAIC (De ne, Measure, Analyze, Improve, Control) process, thereby helping to identify key performance indicators, set realistic targets, and uncover root causes of customer dissatisfaction.This connection becomes critical as the innovative use of GPT-3.5 Turbo, as shown in the study, has potential implications for signi cantly advancing the application of LSS4.0.By harnessing the model's advanced NLP capabilities, businesses can gain a deeper, more nuanced understanding of their customer's voice.This enriched perspective aids in LSS mission of continual improvement, helping businesses to tailor their product development, marketing strategies, and overall processes to better meet customer needs, thus, driving superior customer value and operational e ciency in the era of I4.0.

Conclusion
Integrating customer feedback into the new product development process is critical across industries.
The inherent intangibility of services adds a layer of complexity to customer feedback in service-based sectors.Nevertheless, text analysis has come to the forefront as a potential solution, offering a deep dive into customer perspectives and delivering valuable insights to guide product developers' future endeavors.This article's focal point has been exploring ChatGPT as a robust tool for augmenting our understanding of the customer experience.Future work can focus on addressing the limitations of this robust method.Managing the limitations of GPT-3.5-turbo in generating VoC insights from online tweets involves several strategies.One approach is to augment AI interpretation with human review.This can help decipher complex sentiments and contextual nuances that AI might miss.Further, developing a feedback loop that informs model re nement could lead to ongoing improvement in the system's understanding and handling of data.Secondly, employing a hybrid model of data analysis could help.This would involve combining the capabilities of AI with other analytic techniques to capture emerging trends or shifts in sentiments that the AI model might miss due to its reliance on historical data.Future work might address real-time data limitations by trying to combine the outputs of GPT-3.5-turbo with a model trained on more recent data.This could bridge the gap in understanding current customer sentiments.Lastly, future research could focus on mitigating the risk of generating inappropriate content or biases, and developers can incorporate explicit guidelines or rules into the AI system.Additionally, ongoing auditing and monitoring of AI outputs can help catch and correct such issues.It's also essential to remember that AI systems like GPT-3.5-turbo are tools that should be part of a broader, multifaceted approach to understanding customer sentiment rather than standalone solutions.

Figures Figure 1
Figures

Figure 4 Number
Figure 4

Table 3
Description of categories found in the dataset.
in_response_to_tweet_id The tweet ID to which the current tweet is replying, if applicable.

Table 4
Sample of the Apple Tweets (Texts are as is from the dataset).
Following the launch of the ChatGPT Application Programming Interface (API) on March 1, 2023, many applications have been created, ushering in a new era of potential for both businesses and individuals.
Good input equals sound output.Prompt engineering is crafting effective and precise input prompts to guide an AI language model, such as ChatGPT, toward generating desired responses or output.It involves experimenting with different phrasing questions, instructions, or conversation contexts to obtain the most accurate, relevant, and valuable information from the AI model.Prompt engineering aims to maximize the performance and utility of the language model by iterating and re ning the input prompts.This can

Table 5
Utilized functions.fundamental in handling structured data.It provides data structures and functions for manipulating and analyzing structured data, such as text-based datasets.In addition, it allows easy reading, writing, and manipulation of tabular data.numpyWhilenumpy is primarily used for numerical computations, it works well with pandas and other libraries to support an e cient Python data manipulation interface.It's also used to convert data to numpy arrays, often the input type needed for machine learning libraries.

Table 6
Example of prompt engineering used."I want to map Customer Comments(What are the customers saying?) to Customer needs(What do the customers need?) and Customer Requirements(What is required to ful ll the customer's need?)I can provide you with a couple of examples: ... If I input a message or a conversation between the customer and some customer support agent, you need to provide:

Table 8
Illustration of VoC results.