According to [20,21], social media helped companies in engaging and recognizing their customers and supported them to connect with loyal customers. During interactions through the online environment, a relaxed atmosphere for both customer-customer and customer-company interactions could be accomplished [22]. This “beyond purchase” behavioral dimension of customer engagement includes manifestations, such as social influence through word of mouth, and customer recommendations [23,24]. Thus, the customer’s engagement over social networks with online retailing has an important influence on brand images. From the retailers' perspectives, they can monitor the online user community's feedback more effectively to take necessary action when it is needed. On the other hand, the flow of the conversation between the customer and the retail service providers may affect other customers' insight regarding service providers.
Therefore, this work aims to explore the impact of the customer’s online interactions with the service provider on the polarity of the customers' opinion towards retail service providers. It has been widely known that when there is a dedicated channel that is specifically provided to customers for expressing their opinions, customers are more likely to send negative words through this channel. However, the company engagement with customers has a positive impact on the customers' sentiments towards the brand . Significant reduction of negative sentiments from the customer could be obtained after appropriate interactions with retail service providers. Therefore, the proposed framework aims to identify the customer’s CPC at the end of conversation with the retail service providers (in this study, AmazonHelp is used as a case study). Furthermore, prediction of the sentiment of customers text at the end of the conversation is examined to find out the effect of different conversation features such as its length and text on the improvement/ deterioration of the customer’s opinion towards the customer service provider behavior. To achieve those objectives, a lexicon-based sentiment tool is used to label the conversations [25], then different machine learning techniques are trained on the labeled data in order to classify the change of polarity of the customer’s conversations. The main phases of the proposed framework as shown in Figure 1 are:
- Conversation Extraction: extracting conversations between the customer and the retail service providers (henceforth referred to as customer conversations).
- Feature Engineering: This process aims to convert raw data extracted from the conversation into feature vectors
- Conversation Labeling: for identifying and labeling the customer’s tweet polarity at the start and the end of conversations, and then apply the CPC
- Feature Grouping: which categorized both raw data as well as calculated data extracted from the customer’s conversation into different feature groups/sets.
- Applying Different Machine Learning Models: for detection of CPC and for prediction the polarity of the customer tweet at the end of the conversation
Following, the details of each of these phases are explained.
3.1 Conversation Extraction
Twitter has become a valuable source of data that can be efficiently used for different domains and applications. Such data are mainly used for marketing and social studies [27,28]. Companies can add a deep link to their tweets automatically displaying a call to action button, which allows the customer to send the business a direct message. Thus, we considered only the tweets sent to the customer support of Amazon “@AmazonHelp” [29] on twitter, which is the official customer support account of Amazon -one of the world's largest online retailers- on twitter. Analyzing data that represents the conversations as well as the change of the polarity of customers throughout the conversations flow will give us insight about the significant features that affect such change of polarity.
Only the messages written in English have been extracted using Twitter API[1], so twitter was searched for all the English tweets mentioning the twitter account “@AmazonHelp”. The pre-processing was performed in two steps. First, the whole conversation was extracted , then the cleaning of the text was applied. Traditionally, a conversation thread has a tree structure, where the parent is the source tweet while the rest can be split into individual branches each starting with either a reply from the customer service provider or a reply from an external customer/user. For this research, each conversation was flattened out to be only 2 levels. The first level is the source tweet, and the second level has all of its replies including replies to replies. Therefore, in order to identify the source tweet which represent the start of each conversation, the tweets returned were filtered out such that only tweets that are not a reply to any other tweets are considered as source tweets. After getting the source tweets of the customer conversations with AmazonHelp, twitter was searched to get the rest of the conversation for each of the source tweets. The conversations returned contained tweets of the original customer (source user), AmazonHelp, and any other external customers/users who took part in the conversation. Next, cleaning of the text was applied by stripping URLs, usernames, hashtags, and emoticons. Also, tweets were normalized by removing special characters and any separators other than blanks. Second, on the cleaned tweets, we performed a lemmatization and a grammatical tagging.
3.2 Feature Engineering
During this phase, raw data of both conversation and the source user are mapped into features to be used through the classification and prediction tasks. According to the previous phase, the following attributes were extracted from the Twitter API for each conversation:
- Tweets features: starting by the source tweet. The extracted features of each tweet are: text, author, time, tweet hashtags, media and URL, retweet count and favorite count of tweet.
- Source user features: namely, the author’s verified status (true/false), the author’s followers count, and the author’s friends count.
- Conversational features: other conversational features are required to be used by the classification model, such as, the conversation length (number of tweets in the conversation), total time or duration of the conversation, number of external users, and number of replies/comments of other external users in the conversation. Note that external users are any users participating in the conversation other than AmazonHelp and the original author of the source tweet.
Those features are the only considered in this work, as it has been found that different indicators could affect the emotional transitions involved between customers and the service provider agent such as the length of the tweet, the number of replies, etc [19]. Furthermore, it is also important to consider the conversation length when studying the factors that would lead to convert the negative polarity of the customer into a positive one (change of customer attitude). Therefore, it was necessary to calculate some features from the raw data to be included in the features vector as will be explained in section 3.4. All of the previous features along with the sentiment features in section 3.3 were manipulated and mapped into several feature groups that will be discussed in section 3.4.
3.3 Conversation Labeling Process
The conversation labeling process aims to identify the customers’ perception at the beginning and the end of the conversation. This is done by detecting polarity of the customers’ tweets in an interactive conversation as the customers’ tweets carry their perception toward the product/service as well as reflect the brand image. The conversation labeling process provides labels that represents the user’s point of view in terms of positive (expressing positive sentiment), negative (expressing negative sentiment) and neutral (expressing unbiased sentiment or not expressing any sentiment). The process depends on applying a sentiment analysis tool on the first and last tweets of the customer in each conversation. Stanford API [26] was applied as it provided an accuracy ranging from 89% and reaches 100% on the user tweets according to [25] by applying a Deep Machine Learning technique. The conversation labeling process is divided into two main steps which will be explained in details:
- Tweets labeling
- Conversation Polarity Change (CPC) process.
3.3.1 Tweets Labeling
Stanford API was applied to identify the customer’s attitude in the conversation by labeling the first and last customer’s tweet. The tool used the Recursive Neural Network (RNN) model for investigating the polarity by considering the sentence structure. The RNN model eliminates the problems of losing the sentence’s meaning and semantics. The sentence sentiment labeling estimates the strength of polarity which is categorized into five terms 2, 1, 0, -1, and -2 from the very positive to the very negative respectively.
Table 1. A sample of labeled extracted conversation
|
Tweet text
|
Creation Time
|
Sentiment
|
Source user first tweet
|
"My Amazon pay balance is locked for last 6 mnths... they assure me that issue would be resolved, but it didn't"
|
2020-12-19 16:44:57
|
-1
|
Source user second tweet
|
"Some asked me to do KYC, but kyc is not available in
my area.……, but never received my pay balance."
|
"2020-12-19 16:44:58
|
Source user third tweet
|
"Some told me that they have escalated this issue to the
highest level, …… frustration i have with Amazon service in last 6 months."
|
"2020-12-19 16:44:58
|
Amazon help first tweet
|
"We're sorry to know your Amazon Pay balance is locked.
….. get back to you soon."
|
2020-12-19 16:53:46"
|
|
Source user last tweet
|
"It has been more than 24hrs ….. my 18th attempt …."
|
2020-12-22 03:05:51
|
-1
|
In twitter, it has been found that the source user may express his/her opinion by writing many tweets before the service provider replies. This occurs because of one of two reasons, either due to the limitation of the length of tweets as Twitter allows the user to express his/her opinion by writing only 280 characters per tweet, or due to the delay of the response from the service provider side. This may drive the customer to deliver his/her perception through many tweets. In such cases, the conversation labeling process considers them as one chunk of text representing the user’s point of view provided that the tweets are written sequentially within a specific time window (namely 60 minutes). This time window was chosen after observing the nature of the collected conversations where most of the first tweets that actually express one long tweet occur within that time period. The following tweets occurring after that time before the customer service replies are usually separate tweets and might have a different sentiment than the early tweets. A sample conversation is shown in Table 1 where the source user started the conversation by 3 consecutive tweets within a 2-minute duration. The 3 tweets were merged into one piece of text and labeled to have a sentiment of -1.
3.3.2 Conversation Polarity Change (CPC) process
In order to examine the change of the customer insights against the customer service provider, it is important to observe the pattern of dynamic emotional transitions that occurred during conversations and how different features would impact and have influence on the customer perception. Therefore, this step aims to investigate the conversation’ s emotional direction by taking into consideration the first tweet’s sentiment and the last tweet’s sentiment. The conversation’s emotion direction may be changed positively, negatively, or remain on the same mood during the conversation interaction time. This reflects the customer satisfaction/perception by the end of conversation. The CPC process would end by providing a label to each conversation indicating the change of the customer attitude throughout the conversation. This is done by calculating the difference between the first tweet and the last tweet’s sentiments as shown in Equation 1 .
where:
CPCi indicates the polarity change of the conversation i, SL_EndTweeti is the sentiment label of the end tweet of the conversation i,
and SL_StartTweeti is the sentiment label of the starting tweet of the conversation i.
The CPC has 9 potential values that range from 4 to -4 which shows the strength of change in the customer’s attitude by the end of conversation compared to the start. For example, if the start of the conversation was very negative (-2) and it has been changed into positive (1) by the end of conversation, this reflects achieving three steps toward positivity and the CPC will provide the value 3 as the conversation label. The values ranging from 1 to 4 reflect the degree of positive customer attitude by the end of the conversation, and the negative attitude ranges from -1 to -4. Higher negative values of CPC show a more negative change in polarity, and higher positive values show a more positive change. According to the example shown in Table 1, the CPC will be Zero if the sentiment of the last tweet remains the same as the first one.
Table 2. Conversation Polarity Change (CPC) description
Value of Polarity Change (CPC)
|
Polarity Change
|
CPC > 0 (+1->+4)
|
Positive Change (+1)
|
CPC < 0 (-1->-4)
|
Negative Change (-1)
|
CPC = 0
|
No Change
|
When applying equation 1 to the extracted set of conversations, it has been found that a scarcity problem appears in the set of labeled conversations. Therefore, it was useful to merge some values and map them into three different classes representing the CPC as either Positive, Negative, or No change as shown in Table 2. For example, negative CPC values ranging from -4 to -1 are grouped into one class to indicate a negative change of polarity, and positive values were all merged and mapped into one class to indicate a positive change of polarity, while the zero remained the same in a separate “no change” class.
3.4 Feature Groupings
As mentioned earlier, the main target of this research is to identify how the assistance from a well-known brand of a retailer company can affect their customers positively or negatively regardless of unpleasant complaints sent to them through the online conversation. This is achieved by extracting different attributes which represent both the online conversation as well as the source user as mentioned in section 3.2. The extracted features are split into the following groups, each stored in a separate vector to be used in the experimental section: Conversation Content Features (CCF), Conversation Activities Features (CAF), Conversation Interaction Features (CIF), and Source User Features (SUF). CCF are related to the content of the whole conversation such as text , links , etc. The CAF are related to the activities related to the conversation itself such as time of creation , its length , etc. While, the CIF are related to the interaction between source users (authors) and other users involved in this conversation such as retweets, mentions, as well as other users involved in the whole conversation. Finally, the SUF consider the characteristics of the customer (source tweet author). Following, the details of each feature group are more illustrated.
3.4.1 Conversation Content Features (CCF):
Another important aspect that is related to conversation is its content which is represented by text, images, links, hashtags, etc. Furthermore, since the sentiments are closely related to how people behave in different contexts, it is important to include the sentiment of the first tweet in the conversation in order to detect how the conversation’s polarity change towards the end. Sentiment is defined as an attitude,thought, or judgment prompted by feeling[30]. This is expressed and is typically measured as positive, negative, or neutral sentiment Therefore, the following features are considered here:
- Sentiment of the Source Tweet: which is a sentiment label given to the first tweet of the conversation.
- Hashtags: a binary value indicating if the conversation contains hashtag(s).
- Media: a binary value indicating if the conversation contains any multimedia.
3.4.2 Conversation Activities Features(CAF):
This group of features is concerned with the body of the conversation such as:
- Time Span: which represents the duration of the whole conversation which is calculated based on the difference between the start time of conversation, and its end time. This is one of the important features that has been considered as for example, imagine a scenario where a customer’s initial contact with a human agent is during peak hours, resulting in a long waiting time. This previous experience will consequently affect what this customer expects in subsequent encounters [31].
- Conversation Length: which is calculated based on the total number of tweets in the whole conversation
- Average User Tweet Length : this feature represents the average tweets length of the source user (author).
- Average Amazon Tweet Length: this feature represents the average tweets length of AmazonHelp
- Time Till First Reply: this feature represents the time between the first tweet from the user and the first reply from AmazonHelp.
- Number of First Tweets: the number of a series of tweets before the first reply from the service provider at the beginning of the conversation(if any) .
The last two features are included in this paper as it has been found that they are crucial to give insights about how fast the response is from service providers. Modern customer service providers should be able to professionally react to customers and to be proactive. For example, clarifying questions when asked to, and being able to know what to do when they don't have the answer right away. These are all skills that help comprise a positive customer perception.
3.4.3 Conversation Interaction Features (CIF):
For each conversation thread in the data set, that contains three or more tweets between customer and service providers, other meta data are extracted regarding the interaction inside the conversation. This metadata includes the following features:
- Favorites: Number of people who favorited/liked any of the thread tweets which indicates the degree of influence of the tweet (how acceptable the conversation is).
- Retweets: Number of users who shared this tweet (source tweet).
- Number of Replies: The number of reply tweets from external users.
- Number of External Users: Number of users included in the thread tweets other than Amazon and the source user.
In any conversation between a customer and a retailer, many external users may be involved. It is important to find out if they have an effect on the flow of the conversation, therefore we only include their numbers .
3.4.4 Source User Features (SUF):
The tremendous positive/negative effect of reviews on product sales enables online retailers to manipulate the information presented on the conversation platforms by the posts of the users who initiates the conversation, who may be a regular customer, an influencer, or a sponsor. Therefore, it is important to extract features that represent the source user (main customer) during the time interval of conversation. The features related to the source user are:
- Number of source user’s followers: number of people that currently follow the source user, which is used to measure the popularity of this user among the network.
- Number of source user’s followees: number of people that are currently followed by the source user.
- Verified : Whether the source user has a verified account
[1] Twitter API v1.1: https://developer.twitter.com/en/docs/twitter-api