Transformer-based Information Extraction from Twitter Text on Complaint Monitoring System

doi:10.21203/rs.3.rs-3222858/v1

Customer complaints receive more attention from a company because bad handling of complaints will cause the loss of existing customers. Current customers tend to use social media to report complaints by including information on the location of the complaint so that the location also needs to be extracted because it can be authentic geographic evidence in the complaint monitoring system. The number of complaint reports via social media is increasing occasionally, making the manual complaint monitoring system very inefficient because it takes a long time. This study proposes an automatic complaint monitoring system on Indonesian text from Twitter by extracting transformer-based information. The automatic complaint monitoring system uses a combination of the Bidirectional Encoder Representation from Transformer (BERT) model to extract location information on complaint tweets and the Convolutional Neural Network (CNN) model to classify the complaint type, which gets the highest F1 score of 0.90471. The complaint monitoring system is also visualized in the complaint locations to make it easier for companies to find the location of complaint reports, and action can be taken to deal with public complaints immediately.

Complaint Monitoring System

Information Extraction

Twitter Data

Named Entity Recognition

Deep Learning

Poor handling of complaints can lead to the loss of existing customers and make them spread negative word of mouth, significantly impacting the company's reputation. Today's society tends to spend much time on the internet and social media, such as Twitter and Facebook [1], [2]. That habit changes modify customer behavior for complaining, from offline to online, so different complaint handling procedures must be updated. Fast response times make customers use Twitter to complain about dissatisfaction with the services provided by privately owned and government-owned companies [2]. However, the number of complaint reports via social media on Twitter increases from time to time [3]. A manual complaint monitoring system is inefficient and takes longer because of many complaints with sparse distribution topics [4]. Thus, an automatic complaint monitoring system is needed to assist in managing complaint tweets, and immediate action can be taken.

Several previous studies have carried out a complaint monitoring system from Twitter data, such as a complaints monitoring system for farmers using the machine learning method [5] or a complaint monitoring system for public services [6], [7]. Nevertheless, there has not been much research related to monitoring systems that extract information from a complaint tweet to identify the location entity where the complaint occurred. Information extraction must be done because Twitter social media posts are unstructured [8]. In addition, social media users' tendency to continuously update in reporting their complaints on Twitter by including information on the location of the complaint, which makes the location of the complaint also need to be extracted.

Location information in a complaint tweet becomes authentic geographical evidence in the complaint monitoring system so that the company can find out where the problem is related to the services provided to the community [9]. Complaints can occur because of events, such as power outages, damaged road, events that arise due to water quality problems, or internet problems. Events are defined as something that occurs at a particular time and place and are often associated with entities such as people and locations [10]. The main requirement of information is said to be an event if it has the location entity where the incident occurred in it. Therefore, it is necessary to carry out further analysis using information extraction to obtain the location of the complaint. This location information is also helpful for filtering complaint tweets so that data redundancy can be avoided and can be used to visualize complaint locations to make it easier for companies to handle complaints directly.

The development of information extraction methods underwent three phases: rule-based, statistical machine learning-based, and deep learning-based methods. In recent years, deep learning methods have been widely used because it can automatically change the feature representation of sample data, from low to high level, through a layer-by-layer feature transformation process to make classification easier [11]. The deep learning approach using transformer models is proven to have outstanding performance due to the self-attention mechanism. The task of extracting information is considered a sequence-to-sequence problem, and self-attention has considerable ability to calculate the representation of token positions in a sequence [12].

This research aims to build a complaint monitoring system on Indonesian texts from Twitter by extracting transformer-based information. Transformer-based information extraction is carried out to find the complaint's location in the tweet data. The transformer model is a neural network consisting of two separate modules, the encoder and the decoder. The encoder loads the complaint tweet as input, and the decoder generates label predictions for the named entity recognition task. Visualization of complaint locations is implemented to make it easier for companies to find the location of complaint reports, and immediate action can be taken to handle public complaints.

Related Work

Complaints are defined as reports from consumers to provide information regarding product or service problems. The term complaint implies a feeling of resentment or dissatisfaction that consumers feel toward the person, product, or organization that is responsible [7]. Public service providers, both private and government, can be the target of complaints. Public service refers to activities carried out by an organization to meet public needs to improve the community's quality of life [13]. Over time, many complaints are coming online through social media platforms, generating many complaint texts. Fast identification of consumer complaint text using natural language processing can help public service providers to analyze and handle consumer complaints [7].

Several previous studies have proposed a complaint monitoring system method. Neogi et al. has implemented a monitoring system using the machine learning method taken from the texts of complaints from farmers on Twitter [5]. Hacohen-Kerner et al. proposed a complaint monitoring system method for public services such as insurance companies and cellular communication companies, using four different machine learning methods: Bayes Networks, SMO, SimpleLogistic, and Random Forest [4]. Pratama and Purwarianti conducted a complaint monitoring system for Bandung city government services in Indonesia using machine learning approaches such as SMO, Naïve Bayes Multinomial, and Random Forests [3]. Kim & Hong [13] proposed a complaint monitoring system for bike-sharing public services in South Korea using a deep learning approach to classify positive or negative complaint text on Twitter. Singh et al. also proposed a monitoring system method for several public services using a graph-based semi-supervised approach [7].

Based on some of the existing research, so far, no literature carries out a complaint monitoring system in Indonesian text by extracting information to find out the location of the complaint so that action can be taken as soon as possible. Named Entity Recognition (NER) is used to identify the location of the complaint by extracting the complaint text in Indonesian on Twitter. In recent years, the NER model has often been used in deep learning, which uses convolutional neural networks and is proven to perform better than machine learning [12]. However, the structure of a neural network needs to be trained from scratch according to specific tasks and goals, so it takes a lot of time and resources. Therefore, pre-trained models can obtain more contextual information by training and debugging models on large datasets [14].

Pre-trained models are divided into two types of categories, namely non-contextual and contextual, where both types can learn universal language representations and avoid training new models from scratch again. Word2Vec and GloVe are examples of pre-trained models that study non-contextual word embedding that uses low-dimensional vectors to describe word meanings. The weakness of the non-contextual pre-trained model is that it can only capture the representation of each word while capturing representations based on the context of sentences is still not possible. ELMo (Embeddings from Language Models) and BERT (Bidirectional Encoder Representations from Transformers) were developed to overcome these weaknesses, which can learn contextual word embedding. ELMo is rarely used because it uses Bi-LSTM as an encoder and requires a long training time [14].

BERT outperforms Word2Vec, Glove, and ELMo because it uses a transformer model that learns all words based on their position from right to left or left to right so they can understand the context of a text as a whole. However, during the fine-tuning phase, BERT could not see the [mask], which resulted in specific errors. This error is because the resulting word vector cannot represent information accurately and can reduce the accuracy of the NER task. XLNet is a new transformer-based pre-trained model proposed by Google Brain. XLNet uses an autoregressive language model, making it possible to study contexts in two directions and avoid the weaknesses of the BERT masking method [14].

In this work, a complaint monitoring system will be carried out on Indonesian language complaint texts on Twitter using transformer-based location information extraction, such as BERT and XLNet. Location information helps filter complaint tweets so that data redundancy can be avoided and can be used to visualize complaint locations. Visualization of complaint locations can make it easier for ordinary people and companies to find the location of complaint reports so that immediate action can be taken to deal with public complaints [8], [15]. Several studies have shown that complaint text on Twitter can be used to determine location. However, several obstacles exist, such as tweets with inconsistent geotags and the user's location field cannot be categorized as the current location because the location has expired. The use of NER and the Gazetter approach can overcome this problem because these two methods are famous for predicting the location of the occurrence of complaints in a tweet [16], [17].

The complaint monitoring system developed in this study consists of five main stages, namely data preprocessing, data annotation, location extraction, complaint type classification, and data visualization. Figure 1 is a flowchart of the proposed complaint monitoring system. Details of each stage are explained in the following subsection.

Data Preparation

The data used as initial input is from Twitter which is obtained using Tweepy, an open-source python library used to directly access the Twitter API using a personal access token for authentication purposes [5]. The data collection process took place from December 2020 to March 2021. The data was taken from Twitter, specifically for Surabaya City in Indonesia. Surabaya was chosen because it is one of the largest cities in Indonesia, with ever-increasing population growth. Thus, the city government makes serious efforts to accommodate and facilitate its citizens through the provision of public service facilities in order to create a sustainable urban environment [18]. The transformer-based complaint monitoring system is aimed at complaining tweets in Surabaya to assist the city government in managing public complaints.

A total of 8,500 tweets were collected over four months from several official accounts on Twitter. The credibility of a Twitter account lies in a large number of followers, especially if the tweet message comes from a popular user, such as the official Twitter account of a government-owned company [19]. Table 1 shows some of the official Twitter accounts used in this study. All keywords are shown in Table 2 to filter the complaining tweets.

Table 1 Official Twitter Account Details Used in This Study

Username	Description	Number of Tweets	Number of Followers
@e100ss	The official Twitter account of Radio Suara Surabaya. Radio that presents national news and news for Surabaya and surrounding areas.	535.600	975.700
@sits_dishubsby	The official Twitter account from the Surabaya City Transportation Agency for the traffic control system.	127.300	254.300
@SapawargaSby	Surabaya city government's official Twitter account. The account is managed by the Surabaya City Communication and Information Service.	29.600	105.000
@MNCPlayID	MNC Play's official Twitter account. MNC Play is one of Indonesia's internet and subscription cable TV providers.	127.600	51.900
@MNCPlaySBY	MNC Play's official Twitter account for the Surabaya and Sidoarjo areas.	2.772	1.229
@IndiHome	IndiHome's official Twitter account. IndiHome is a services provider of internet, home telephone, and interactive television services in Indonesia.	931.600	256.000
@pln_123	The official Twitter account of PT. PLN. PT PLN is a government-owned company that handles all aspects and problems of electricity in Indonesia.	1.600.000	618.300
@PDAMSurabaya	The official Twitter account for an Indonesia’s government-owned company is engaged in distributing clean water, especially for Surabaya City.	18.600	31.700

Table 2 Explanation of Complaint Types, along with Keywords and Sample Tweets for Each Complaint Types

Complaint Type	Description	Twitter Account for Crawling Data	Search Keywords	Sample Tweets
Road Damage	Complaints about road or traffic damage conditions.	@e100ss, @sits_dishubsby, @SapawargaSby	“jalan rusak”, “jalan berlubang”	“Kondisi jalan rusak parah di kalianak surabaya, mobil2 pada patah as roda” ("The condition of the road is badly damaged in Kalianak, Surabaya, and the cars have broken wheels")
Internet	Complaints about internet or wifi problems.	@MNCPlayID, @MNCPlaySBY, @IndiHome	“internet error”, “wifi lemot”, “wifi tidak bisa, internet lambat”	“apakah daerah bandung sedang ganguan? soalnya ini internet ga nyala dari pagi” ("Is the Bandung area in trouble? The problem is the internet does not turn on since morning")
Water Quality	Complaints about the condition of water problems due to interference from the government-owned clean water company.	@PDAMSurabaya	“PDAM mati”, “PDAM”, “gangguan pdam, air tidak mengalir”, “pdam tidak keluar air”	“Mohon info ini air mati di daerah Wisma Lidah Kulon dan sekitarnya sampai kapan dan jam berapa? Terima kasih” ("Please give this information how long the dead water is in the Lidah Kulon Wisma area and its surroundings? Thank You")
Power Outages	Complaints about the condition of electricity not turning on or a building does not have electricity.	@pln_123	“mati listrik”, “listrik mati”, “mati lampu”, “pemadaman listrik”, “pengaduan pln”, “pln mati”, “pemadaman bergilir”	“Daerah ketintang permai listriknya padam, tolong segera ditangani” ("The electricity is out in the Ketintang area, and please take care of it immediately")
Non-complaint	It is not included in the complaint sentence that expresses consumer dissatisfaction with a service.	-	-	“min.. bs tolong di cek lampu padam di rumah kami.” ("min.. can you please check the lights in our house.")

Data Preprocessing

The data obtained from the crawling process will then be preprocessed. This research has an extensive collection of tweet text data, so it needs to be cleaned to avoid specific differences that can result in inconsistent data [5], [20]. The preprocessing stage is carried out by removing some parts that are not needed, such as URL links, mentions which are represented in the form “@”, and retweets that are usually written with "RT". Tokenization is done by dividing the tweet text sentence into smaller units called tokens [5]. Case folding is implemented by changing tweet sentences to lowercase [21].

Text normalization is done by changing non-standard or non-formal language into standard forms. The slang word is a term that refers to non-formal language commonly used in online conversations such as on social media Twitter. Slang words are formed from a term, an abbreviation, or a combination of both. Using short and easy non-formal language makes Twitter users use it in online communication. Slang words make it difficult for machines to analyze and understand their words meaning [22]. For example, in non-formal Indonesian, words are shortened by removing vowels, such as the word "tidak (no)" being shortened to "tdk" [23]. Therefore, slang words need to be changed into their standard forms. Tweet data through the preprocessing stage is stored in a log format.

Data Annotation

The data annotation process is carried out to obtain the labeled data needed to train the supervised model [15]. A total of 1,627 samples of tweet data were annotated. A single annotator, the fourth author of this study, performed annotation in three steps. First, the tweets are labeled with a location label using the BIO tag format to build a model for location extraction. Prefix B- (Begin) indicates the first word of an entity. Prefix I- (Inside) indicates the next word after the first word of an entity [24], [25]. The location labels used in this study are LOC, GPE, BLD, HWYMSE, MSE, NPL, TIME, DATE, and OBJ. The O (Other) label indicates that the word is not part of any entity. The definition of location labels is explained in Table 3.

Second, tweets are categorized into five categories to build a complaint type classification model. The five categories come from various types of industries and different areas of activity to represent common public complaints in Indonesia regarding public facility services. The definition of each complaint type label is explained in Table 2. Third, tweets are labeled as relation labels to build a Relation Extraction Model. Relation extraction aims to extract relations between entities successfully identified. Some relation labels in the relation extraction are Highway-Position, Street-Place, Starting Point-Destination, and Other. A detailed explanation of relation labels can be seen in Table 4..

Table 3 Entity Labels For Location Extraction in Complaint Monitoring System

Entity Label	Description	Word or Phrase Example
LOC (Location)	Non-GPE locations, such as street names	Kertajaya, Gubeng
GPE (Geographical Entity)	City name, country name	Surabaya, Malang, Blitar
BLD (Building)	Building name	Taman Pelangi, Taman Ekspresi
		(Rainbow Park, Expression Park)
NPL (Natural Place)	Natural Place name	Sungai Brantas, Gunung Bromo (Brantas River, Mount Bromo)
HWYMSE (Highway Measurement)	Unit kilometers on the highway	Km 20, KM 120
OBJ (Object)	Terms of things, not people	Truk, Mobil, Motor (Trucks, Cars, Motorcycles)
MSE (Measurement)	The unit of measure for an object, for example, the strength of an earthquake	1 Km, 7.2 SR, 25 cm
TIME	Time is smaller than the day or date	15.35, 16:20
DATE	Absolute date or period	7-February-2023, 1/1/2021
O (Other)	Other entities besides location, date, and time	Saya, dimana, syukurlah (I, Where, Thank Goodness)

Table 4 Types of Relations in The Manual Labeling Process for Relation Extraction

Relation Label	Description	Labeling Example
Highway Position	The relationship between LOC (the position of a place name on the highway) and MSE (kilometer unit on the highway).	Jalan berlubang tol Gempol pada km20 arah Surabaya (A hollow road on the Gempol Highway (LOC) at km20 (MSE) towards Surabaya)
Street-Place	Relationship between LOC (street name) and LOC (place name)	Jalan darmo aja Surabaya jalan utama yaa rusak sebelum TL darmo (Darmo road (LOC-Street name), Surabaya, the main road, was damaged before TL Darmo (LOC-Place name))
StartingPoint-Destination	Relationship between LOC (name of the place as starting point) and LOC (name of the place as destination)	Poris ke Green Lake jalannya sebagian ada yang rusak. (Poris (LOC-Starting Point) to Green Lake (LOC-Destination), there are some roads damaged.)
Other	No relationship	Internet untuk daerah Keputih masih gangguan ya min? (Internet for the Keputih area, is there still a problem?)

Location Extraction

Location entities on tweet complaints can be extracted using a Named Entity Recognition (NER) task [16], [26]. The NER model used is transformer-based because it is proven to have excellent performance due to a self-attention mechanism [12]. The NER models are BERT and XLNet, which will train tweet complaint data to study entities such as location, geographic entity, building, road measurement, natural place, time, date, object, measurement, and other entities.

a. BERT model

BERT is trained to learn all words based on their position from right to left or left to right so that they can understand the context of a text based on its entire environment (right and left of the text) [16]. BERT consists of an encoder, and each block is transformer based. The BERT input is a text string with a maximum length of 512 represented in a vector. For each input, there is a special symbol (CLS) that is added at the beginning of the sequence and a special token (SEP) that is useful for dividing the sequence into segments that determine whether the token comes from sentence A or sentence B. Position embedding is also added to each token so that the input representation on a token is the number of tokens, segments and pin positions. After being represented in a vector, it will proceed to the self-attention layer and the neural network for each block. The results of the final text representation will be stacked on top of the BERT to predict the possible location entity labels for each text [27].

b. XLNet model

Permutation Language Model (PLM) is used in XLNET to combine the advantages of Autoencoder and Autoregressive. BERT is the Autoencoder method, where certain words from the input sentence will be masked, and the data will be restored. GPT is the Autoregressive method, which uses the transformer's decoder to predict the output. PLM will randomly sort each word to generate phrases and cover the last few words. Autoregressive is used to predict the covered word by considering the previous words. XLNet also uses the recurrence mechanism and relative position encoding in TransformerXL, which can record hidden state memory sequences from each permutation and encode relative positions consistently between different permutations. Thus, XLNet can enrich information in the context of long sentences by representing each token according to the semantics of the sentence [14].

Complaint Type Classification

The Convolutional Neural Network (CNN) and Convolutional Long Short Term Memory (CLSTM) are the classification methods used to classify the complaints in complaint tweets. Classification is carried out after identifying the location entity because it refers to the intent of the event information, which must have at least one entity representing the location entity [10]. Details of the hyperparameter settings used in this study are shown in Table 5.

a. Convolutional Neural Network (CNN)

CNN consists of multi-layers in a neural network, each with many features. Furthermore, convolution is carried out on each filter with a particular kernel size in the convolutional layer. The function of the pooling layer is carried out simultaneously with the convolution process to get the maximum value from one kernel. Then, the dropout function is performed to eliminate unused features to prevent overfitting. The Cross-Entropy Loss Function is used because it has a clear decision boundary in a classification task which helps assess the predictions of a classification model.

b. Convolutional Long ShortTerm Memory (CLSTM)

The first stage of complaint type classification is N-gram feature extraction via one-dimensional convolution, involving a filter vector sliding over a sequence and detecting features at different positions. The N-Gram was obtained from Pre-trained Word2Vec Wikipedia Indonesian, which was then converted into a convolution feature and entered into the LSTM. The CLSTM model consists of one convolutional layer and one LSTM layer, which is changed via dropout by adding L2 Loss regulation as a weighting in softmax. The loss function used is cross-entropy, and the use of LSTM is adopted because it can capture long-term dependencies between words in a sentence [28].

The results of reporting the complaints from the CNN model or CLSTM model will be divided into five classes: power outages, damaged roads, water quality, internet and non-complaints. Tweets in the non-compliant label are only for performance evaluation without proceeding to the data visualization stage. Tweets included in complaint labels, such as labels for complaints of power outages, damaged roads, quality, and the internet, will then proceed to the data visualization and performance evaluation stages. Precision, recall, and f-measure values are used to evaluate classifier performance.

Table 5 Hyperparameter Settings for Complaint Monitoring System

Parameter	Description	Value
Epochs	The number of passes that must be completed by the algorithm in processing the training data [21].	40
Layers	The number of neurons in the output layer of a given input [29].	2
Learning Rate	Parameters used to minimize the loss function.	le-3
Embedding Size	The vector size is used to represent the embedding word.	300
Drop Out	Regularization technique of a neural network.	0.5
Loss Function	Function to assess the predictions of the classification model.	Cross-entropy

Data Visualization

Twitter data visualization is in the form of a website, which utilizes Laravel Framework 5.8.8, PHP 7.4.13, MySQL database, and Google Maps V3 API to display a map of the location of the complaint. The location of the complaint can be identified using the Named Entity Recognition method. Named entities that NER has recognized, then extraction of relations between entities is carried out. Additional gazetteer data was obtained from openstreetmap.org (OSM) and limited to Surabaya in Indonesia. The output from OSM is in the form of an XML file, which is then extracted to obtain the location id, city, location address, location name, latitude, and longitude, using the help of the xmltree python library.

The location data that has been obtained also produces information about the source (starting location), destination (final location), and way (central location), but not all data has a way. The concept of a graph is used to determine which nodes are the source, destination, and way. The source is analogous to the root of the graph, the destination is the end of the graph branch, and the way is the node that connects the root to the final node.

After the location data is converted into a graph, geocoding is then carried out by converting ambiguous addresses into numerical geographic coordinates (latitude and longitude), which can be used to place markers on a map or give the position of an address on a map [30]. Geocoding also marks with markers on the map. The color of the marker is differentiated based on the type of complaint.

There are several evaluations carried out in this study. First, location extraction of complaint tweets using the BERT and XLNet models is evaluated by comparing the precision, recall, and F1 score values. Second, the classification model is evaluated by comparing the performance of the CLSTM and CNN classifiers. Third, an evaluation of the data visualization test was also carried out.

Location Extraction Evaluation

A comparison of location entity labeling evaluation metrics using the XLNet and BERT models is shown in Table 6. The average F1 score for the BERT model is 2%, superior to the XLNET model. XLNet's F1 score is low because it integrates the recurrence mechanism in permutation arrangements. The XLNet model can reuse hidden states from previous segments, making XLNet perform better on longer text sequences [31]. However, because this study uses complaint tweets which tend to have short sentences, the XLNet application is not suitable and causes many errors in the location extraction of complaint tweets.

BLD entities cannot be appropriately identified by the XLNet model, so the F1 score is the lowest, around 0.59016. The low F1 score is because the XLNet model tends to mispredict BLD entities to become LOC entities, where BLD shows the name of the building, and LOC shows the location's name. Table 7 details the error made by the XLNet model by indicating the incorrectly predicted token in a sentence. Example of error number 3 in Table 7 shows the phrase "rs. wiyung (Wiyung Hospital)", which is the name of the building (actual label BLD), is identified by the XLNet model as a location name (predicted label LOC). The XLNet model predicts the word "wiyung" as the location because there is a street name in Surabaya City in Indonesia which is also named "wiyung", namely "Jalan Raya Wiyung (Wiyung Highway)". Therefore, the model becomes ambiguous in identifying the building name to the street location name. Another factor that causes the prediction error is characteristic of informal complaint tweets with many abbreviations, such as "rs." which stands for "hospital". Thus, the XLNet model has difficulty recognizing the phrase "rs. wiyung" which is an integral part of the name of the building in the city of Surabaya.

Another mistake made by the XLNet model is not being able to identify HWYMSE entities correctly, where there is a tendency to incorrectly predict a word to be labeled as a TIME entity. The error example in number 4 in Table 7 shows that the XLNet model incorrectly predicts the word “11” from the phrase “km 11” as a TIME entity. The prediction error is caused by the TIME entity, which tends to be in the form of numbers, and the HWYMSE entity also shows units of kilometers on toll roads which are usually represented in numbers. Sentences containing ambiguous words can make it difficult for the XLNet model to predict location entity labels correctly.

The XLNet model is also wrong in predicting LOC labels to become OBJ labels, as seen in example number 2, Table 7. The phrase "jalan mayor mustajab" shows a unity of street names in Surabaya City, Indonesia. However, the XLNet model predicts that the word "mayor (head of the municipality)" will become a separate entity as an OBJ (Object) entity. The model cannot predict specific unpopular locations correctly because these location names rarely appear in the dataset used to train XLNet.

However, the BERT model can also make mistakes in extracting the location of complaint tweets. Table 8 shows an example of a sentence error made by the BERT model in extracting location information. The BERT model cannot extract locations down to the Village, District, or Kelurahan levels in Surabaya City, as seen in example number 3, Table 8. The phrase "simomulyo baru" is a form of one unit that shows the name of one of the sub-districts in Surabaya, Indonesia. The BERT model identifies the word "baru" from the phrase " simomulyo baru" as the Other entity.

Both the BERT model and the XLNet model are still unable to show the actual location of the complaint tweet, as in case number 1 in Tables 7 and 8. The BERT and XLNet models make mistakes in predicting the LOC entity to become GPE. The word "gresik" in example 1 in Table 7, means a city name in East Java Province, Indonesia. The phrase "tol gresik (Gresik Toll)" is the actual location where the fire occurred, not only in the city of "gresik". However, the XLNet model can only predict the location where the fire occurred in "Gresik". Thus, further research is needed to find detailed information about the actual location of an event from the text, which requires broader limitations in capturing location information from the text and extra effort in the annotation process on the dataset.

Table 6. Comparison of Evaluation Metrics for Labeling of Location Entities Using the XLNet and BERT Models

	Model
	XLNet			BERT
Entities	Precision	Recall	F1	Precision	Recall	F1
LOC	0.91304	0.92216	0.91758	0.93929	0.96200	0.95051
GPE	0.95000	0.89764	0.92308	0.98305	0.96667	0.97479
BLD	0.81818	0.46154	0.59016	0.88889	0.76190	0.82051
HWYMSE	1.00000	0.57895	0.73333	1.00000	1.00000	1.00000
NPL	1.00000	0.70000	0.82353	1.00000	1.00000	1.00000
TIME	0.86667	0.95122	0.90698	1.00000	0.96429	0.98182
DATE	0.95000	0.70370	0.80851	0.92857	1.00000	0.96296
MSE	0.84746	0.80645	0.82645	0.90741	0.96078	0.93333
OBJ	0.81522	0.76531	0.78947	0.76056	0.80597	0.78261
O	0.97695	0.98734	0.98212	0.99197	0.98861	0.99029
Weighted Average	0.96256	0.96327	0.96206	0.98071	0.98043	0.98049

Table 7. Example of Misclassification of Location Entity Labeling Using the XLNet Model

ID	Actual Label	Predicted Label	Token	Tweet in Indonesian	Tweet in English Translation
1	LOC	GPE	gresik	09.35: ada proyek pengerjaan jalan di km 11.400 tol gresik arah kebomas. kendaraan hanya bisa melalui lajur kanan, waspadai.	09.35: There is a road construction project at km 11,400 of the Gresik toll road towards Kebomas. Vehicles can only go through the right lane, be aware.
2	LOC	OBJ	walikota	jalan walikota mustajab arah jalan gubeng pojok padat merambat imbas volume	Jalan Mayor Mustajab, Jalan Gubeng, a congested corner propagates the volume effect.
3	BLD	LOC	wiyung	jalan yg depannya rs. wiyung yg masuk mau ke smpn 59 itu juga subhanallah bgt dah kyk jalan di sirkuit	The road in front of the Wiyung Hospital to SMPN 59 is Subhanallah, like walking on a circuit.
4	HWYSE	TIME	11	tol satelit arah waru padat. ada kendaraan mogok di km 11.	Waru direction satellite toll road is creeping. There is a vehicle that broke down at km 11.

Table 8. Examples of Misclassification of Location Entity Labeling Using the BERT Model

ID	Actual Label	Predicted Label	Token	Tweet in Indonesian	Tweet in English Translation
1	LOC	GPE	surabaya	10.47: info awal kebakaran mobil di km 740 tol sumo arah surabaya dekat gate tol warugunung . wahyudi pendengar ss melaporkan, posisi mobil ada di lajur kiri dan penumpangnya yang berjumlah dua orang sudah turun. hm	10.47: Initial information on a car fire at km 740 of the Sumo toll road towards Surabaya near the Warugunung toll gate. Wahyudi SS listeners reported that the car's position was in the left lane, and the two passengers had gotten off. Hmm
2	BLD	OBJ	perpustakaan	perpustakaan kota surabaya ada pemadaman listrik sampai jam 4/5 an sore ini , jadi belum boleh masuk dan pinjam buku karena sistemnya mati .	Surabaya city library has a power outage until 4/5 this afternoon, so you cannot go in and borrow books because the system is down.
3	LOC	O	baru	mati lampu om . . sebagian wilayah rt01 rw04 simomulyo baru	Lights out, bro. . some areas of rt01 rw04 Simomulyo Baru
4	LOC	O	jembatan	akibat jembatan lembah dieng putus diarahkan lewat jalan kecil disamping , nah saat ini juga bermasalah , ada lubang dan aspal bergelombang akibat gerusan air	As a result of the broken Dieng Valley Bridge being directed via a small road on the side, there is now a problem. There are holes and bumpy asphalt due to water scouring.

Complaint Type Classification Model Evaluation

Complaint tweets must have at least one entity representing the location, such as the entity label LOC, GPE, BLD, or NLP. The main requirement for important incident information is to have at least one location entity. Thus, after the location of the complaint tweets has been extracted using the BERT model or the XLNet model, the complaint tweets are then classified using CLSTM and CNN to determine the type of complaint and compare the performance of the two classifier models.

Table 9 is an evaluation metric for combining complaint type classification models. The highest F1 score is obtained by the BERT + CNN model, where BERT is for extracting the location of complaint tweets, and CNN is for classifying the type of incident. Leveraging the CNN model, it can obtain the highest F1 score because the complaint tweets are short and independent between sentences, so complaint tweets only require representation in the form of local features. Therefore, the CNN model is more suitable because it can adequately extract local features from a complaint tweet and is not biased toward high-level feature extraction. The local feature extraction process is carried out on a convolutional layer which combines words one by one to capture semantic information at the word level or local semantic information [32]–[35].

An example of a sentence error in the event type classification model can be seen in Table 10. Example number 1 in Table 10 shows that CLSTM incorrectly predicted a sentence that should have been labeled "Power Outages" to be labeled "Non-complaint". Misclassification can be caused by an imbalance class which results in a bias towards the majority class, where the "Non-Complaint" label is the majority class with the amount of data during training that is more than the other class labels. Therefore, the model will tend to overclassify the majority class group because the probability of the model predicting complaint tweets to be labeled "Non-Complaint" increases [36]. Data that should belong to the minority class group are more often classified to the majority class group.

The CLSTM and CNN models cannot predict a complaint tweet correctly if they contain words or phrases that contain figures of speech, such as cynicism. Cynicism is an expression to satirize something by saying the opposite. Example 2 in Table 10 is a complaint sentence that contains a figure of speech of cynicism. The Twitter user did not explicitly state that there was a damaged road, but expressed it by using a cynicism figure of speech to show his annoyance when crossing a damaged road through the phrase 'the road is awesome!'. Another factor that affects the CLSTM model and CNN cannot predict complaint tweets well is the characteristic of informal language on Twitter, which does not use good grammar. Example 2 also shows that there is no subject in the sentence, which makes the complaint sentence ambiguous and causes the model to be unable to predict accurately.

Another cause of model errors in classifying the complaint types is that there were errors during location extraction, which caused the process of classifying the complaint types to be wrong. Example 4, in Table 8, shows that the BERT model is wrong in extracting the location of the word "jembatan (bridge)" into the Other entity. The location extraction error in the sentence resulted in the BERT model, making the CNN model unable to correctly predict the type of complaint, as seen in example number 3 in Table 10. Thus, the location extraction process dramatically influences the model in classifying the type of complaint.

Table 9. Evaluation Metrics in the Complaint Type Classification Model

Model	Complaint Type Category	Precision	Recall	F1
XLNet+CNN	Non-Complaint	0.83495	0.95556	0.89119
	Damaged Road	0.88235	0.71429	0.78947
	Internet Trouble	0.94118	0.72727	0.82051
	Water Quality	0.97143	0.87179	0.91892
	Power Outages	0.94915	0.94915	0.94915
	Weighted Average	0.90159	0.89610	0.89470
XLNet+CLSTM	Non-Complaint	0.77570	0.92222	0.84264
	Damaged Road	0.68182	0.71429	0.69767
	Internet Trouble	0.89474	0.77273	0.82927
	Water Quality	0.93548	0.74359	0.82857
	Power Outages	0.98077	0.86441	0.91892
	Weighted Average	0.85786	0.84416	0.84529
BERT+CNN	Non-Complaint	0.83654	0.98864	0.90625
	Damaged Road	0.94444	0.73913	0.82927
	Internet Trouble	0.94737	0.75000	0.83721
	Water Quality	0.97143	0.87179	0.91892
	Power Outages	0.96552	0.93333	0.94915
	Weighted Average	0.91406	0.90598	0.90471
BERT+CLSTM	Non-Complaint	0.74528	0.89773	0.81443
	Damaged Road	0.56522	0.56522	0.56522
	Internet Trouble	0.95238	0.83333	0.88889
	Water Quality	0.93750	0.76923	0.84507
	Power Outages	0.98077	0.85000	0.91071
	Weighted Average	0.84124	0.82479	0.82737

Table 10. Examples of Sentence Errors in the Complaint Type Classification Model

ID	Tweet in Indonesian	Tweet in English Translation	Actual Label	Predicted label of Complaint Type using BERT and Classifier		Predicted label of Complaint Type using XLNet and Classifier
ID	Tweet in Indonesian	Tweet in English Translation	Actual Label	CLSTM	CNN	CLSTM	CNN
1	ini listrik padam daerah gayungsari timur, surabaya mulai pk 09.00 pagi tadi gimana ini kita bayar telat aja diputus kalau listrik gak segera nyala apa kompensasi nya ? hey pln	This is a power outage in the East Gayungsari area, Surabaya starting at 09.00 this morning; how about we pay for it late? If the electricity does not turn on immediately, what is the compensation? Hey, PLN.	Power Outages	Non-Complaint	Power Outages	Non-Complaint	Power Outages
2	daerah lakarsantri ke utara juga jalanya aduhai	The Lakarsantri area to the north also has awesome roads.	Road Damaged	Non-Complaint	Non-Complaint	Non-Complaint	Non-Complaint
3	akibat jembatan lembah putus diarahkan lewat jalan kecil disamping nah sat ini juga bermasalah ada lubang dan aspal bergelombang akibat gerusan air	As a result of the broken Dieng Valley Bridge being directed via a small road on the side, there is now a problem. There are holes and bumpy asphalt due to water scouring.	Road Damaged	Non-Complaint	Water Quality	Road Damaged	Road Damaged

Data Visualization Evaluation

Functionality testing is carried out by acting as a user who runs the features of the complaint monitoring system that has been developed. The first functional requirement of the system is that the user can open the website and choose the timeframe for complaints, as shown in Figure 2. The following functional requirement is that the user can view tweets from the pin-point of a complaint in maps. The user must press the three-line icon in the upper right corner of the complaint monitoring system, then select the time range, until finally, the system will display the distribution of complaint tweets in the selected time range. Figure 3 details a complaint tweet that can be seen by pressing a point on the map, then the system will display a complaint tweet based on the pin-point on the map. The pink pin-point is for damaged road complaints, the light blue is for water quality complaints, the blue contains internet complaint tweets, and the orange is for power outage complaint tweets.

Users can see a list of complaints in tabular form, including the system's functional requirements. Figure 4 lists complaints in a tabular form containing Twitter user identities (User ID), complaint tweets, complaint dates, and complaint types. The complaint monitoring system is also equipped with a complaint recapitulation chart by month and sub-district so that it can make it easier for companies to handle public complaints. Figure 5 is the result of complaint recapitulation diagram by month and district. The summary diagram also has a figure legend that helps clarify the bar colors' meaning. The pink bars are for bad road complaints, the light blue bars are for water quality complaints, the blue bars show the number of internet complaint tweets, and the orange bars are for the number of power outage complaint tweets. Thus, it can be concluded that the complaint monitoring system has fulfilled the overall functionality test.

This study conducted a complaint monitoring system on Indonesian text from Twitter by extracting transformer-based information. The BERT and XLNet models are used to extract the location of the complaint tweet because the location information on the complaint tweet becomes authentic geographical evidence in the monitoring system so that the company can find out where the complaint occurred. Various experimental scenarios and analyzes were carried out on the model used in this study. The evaluation and analysis results show that the combined performance of BERT and CNN obtains the highest F1 score, around 0.90471, where BERT functions to extract location information, and CNN is used to classify the types of complaints in tweets. The location information extraction process dramatically influences the model in classifying the types of complaints. CNN can classify the complaint types in tweets well because it can adequately extract local features from a complaint tweet and is not biased toward high-level feature extraction. In addition, visualization of the location of complaints was also successfully carried out to make it easier for companies to find the location of complaint reports, and action can be taken to deal with public complaints immediately. Users can choose the timeframe for complaints, view tweets from complaint pin-points on the map, and see a list of complaints containing Twitter user identities, complaint tweets, date of complaint, and type of complaint.

For further work, a model for information extraction needs to be developed to obtain a model that can extract locations down to administrative levels, such as village and sub-district levels. Both the BERT and XLNet models still cannot show the actual location of the complaint tweet, causing a misclassification of the type of complaint. Therefore, it is also necessary to develop a model for information extraction that can find out detailed information about the actual location of an event from a text on Twitter, which requires extra effort in the annotation process on the dataset.

Ethics approval and consent to participate.

This article is original and have not been published elsewhere.

Consent for publication

The authors affirm that all data was collected under the Twitter develo per policy 2022, y. https://developer.twitter.com/en/developer-terms/policy and do not mention the privacy information.

Availability of data and materials

Not applicable

Competing interests

Not applicable

Funding

Not applicable

Authors' contributions

Diana Purwitasari: Conceptualization, Methodology, Investigation, Writing - Original Draft, Writing - Review & Editing

Chastine Fatichah: Conceptualization, Methodology, Formal analysis, Writing - Review & Editing, Supervision

Amelia Devi Putri Ariyanto: Validation, Investigation, Resources, Data Curation, Writing - Original Draft

Sherly Rosa Anggraeni, Aulia Eka Putri Aryani: Software, Validation, Resources, Data Curation, Visualization

Acknowledgements

Not applicable

S. A. Einwiller and S. Steilen, “Handling complaints on social network sites - An analysis of complaints and complaint responses on Facebook and Twitter pages of large US companies,” Public Relat Rev, vol. 41, no. 2, pp. 195–204, Jun. 2015, doi: 10.1016/j.pubrev.2014.11.012.
D. Istanbulluoglu, “Complaint handling on social media: The impact of multiple response times on consumer satisfaction,” Comput Human Behav, vol. 74, pp. 72–82, Sep. 2017, doi: 10.1016/j.chb.2017.04.016.
T. Pratama and A. Purwarianti, “Topic classification and clustering on Indonesian complaint tweets for bandung government using supervised and unsupervised learning,” in 2017 International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA), Aug. 2017, pp. 1–6. doi: 10.1109/ICAICTA.2017.8090981.
Y. HaCohen-Kerner, R. Dilmon, M. Hone, and M. A. Ben-Basan, “Automatic classification of complaint letters according to service provider categories,” Inf Process Manag, vol. 56, no. 6, Nov. 2019, doi: 10.1016/j.ipm.2019.102102.
A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment analysis and classification of Indian farmers’ protest using twitter data,” International Journal of Information Management Data Insights, vol. 1, no. 2, Nov. 2021, doi: 10.1016/j.jjimei.2021.100019.
J. Osorio-Arjona, J. Horak, R. Svoboda, and Y. García-Ruíz, “Social media semantic perceptions on Madrid Metro system: Using Twitter data to link complaints to space,” Sustain Cities Soc, vol. 64, Jan. 2021, doi: 10.1016/j.scs.2020.102530.
A. Singh, S. Saha, M. Hasanuzzaman, and A. Jangra, “Identifying complaints based on semi-supervised mincuts,” Expert Syst Appl, vol. 186, Dec. 2021, doi: 10.1016/j.eswa.2021.115668.
L. Belcastro et al., “Using social media for sub-event detection during disasters,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00467-1.
A. Kumar and J. P. Singh, “Location reference identification from tweets during emergencies: A deep learning approach,” International Journal of Disaster Risk Reduction, vol. 33, pp. 365–375, Feb. 2019, doi: 10.1016/j.ijdrr.2018.10.021.
M. Hasan, M. A. Orgun, and R. Schwitter, “Real-time event detection from the Twitter data stream using the TwitterNews+ Framework,” Inf Process Manag, vol. 56, no. 3, pp. 1146–1165, May 2019, doi: 10.1016/j.ipm.2018.03.001.
Y. Yang, Z. Wu, Y. Yang, S. Lian, F. Guo, and Z. Wang, “A Survey of Information Extraction Based on Deep Learning,” Applied Sciences (Switzerland), vol. 12, no. 19. MDPI, Oct. 01, 2022. doi: 10.3390/app12199691.
J. Han and H. Wang, “Transformer based network for Open Information Extraction,” Eng Appl Artif Intell, vol. 102, Jun. 2021, doi: 10.1016/j.engappai.2021.104262.
N. R. Kim and S. G. Hong, “Text mining for the evaluation of public services: the case of a public bike-sharing system,” Service Business, vol. 14, no. 3, pp. 315–331, Sep. 2020, doi: 10.1007/s11628-020-00419-4.
R. Yan, X. Jiang, and D. Dang, “Named Entity Recognition by Using XLNet-BiLSTM-CRF,” Neural Process Lett, vol. 53, no. 5, pp. 3339–3356, Oct. 2021, doi: 10.1007/s11063-021-10547-1.
P. K. Putra, R. Mahendra, and I. Budi, “Traffic and road conditions monitoring system using extracted information from Twitter,” J Big Data, vol. 9, no. 1, Dec. 2022, doi: 10.1186/s40537-022-00621-3.
R. Prasad, A. U. Udeme, S. Misra, and H. Bisallah, “Identification and classification of transportation disaster tweets using improved bidirectional encoder representations from transformers,” International Journal of Information Management Data Insights, vol. 3, no. 1, p. 100154, Apr. 2023, doi: 10.1016/j.jjimei.2023.100154.
C.-Y. Huang, H. Tong, J. He, and R. Maciejewski, “Location Prediction for Tweets,” Front Big Data, vol. 2, 2019, doi: 10.3389/fdata.2019.00005.
A. Pamungkas, D. Iranata, J. Yuwono, and L. M. Jaelani, “An insight on Surabaya development: Pre colonials, colonial, post colonial and current era,” in IOP Conference Series: Earth and Environmental Science, Institute of Physics Publishing, Oct. 2019. doi: 10.1088/1755-1315/340/1/012002.
M. R. Nair, G. R. Ramya, and P. B. Sivakumar, “Usage and analysis of Twitter during 2015 Chennai flood towards disaster management,” in Procedia Computer Science, Elsevier B.V., 2017, pp. 350–358. doi: 10.1016/j.procs.2017.09.089.
F. Farhangi, “Investigating the role of data preprocessing, hyperparameters tuning, and type of machine learning algorithm in the improvement of drowsy EEG signal modeling,” Intelligent Systems with Applications, vol. 15, Sep. 2022, doi: 10.1016/j.iswa.2022.200100.
S. Behl, A. Rao, S. Aggarwal, S. Chadha, and H. S. Pannu, “Twitter for disaster relief through sentiment analysis for COVID-19 and natural hazard crises,” International Journal of Disaster Risk Reduction, vol. 55, Mar. 2021, doi: 10.1016/j.ijdrr.2021.102101.
L. Huang, S. Zhuang, and K. Wang, “A text normalization method for speech synthesis based on local attention mechanism,” IEEE Access, vol. 8, pp. 36202–36209, 2020, doi: 10.1109/ACCESS.2020.2974674.
Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00413-1.
F. Dernoncourt, J. Y. Lee, and P. Szolovits, “NeuroNER: an easy-to-use program for named-entity recognition based on neural networks,” in Association for Computational Linguistics, Association for Computational Linguistics, 2017, pp. 97–102. [Online]. Available: https://github.com/
G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural Architectures for Named Entity Recognition,” in Association for Computational Linguistics, 2016, pp. 260–270. [Online]. Available: https://github.com/
V. Rachman, S. Savitri, F. Augustianti, and R. Mahendra, “Named Entity Recognition on Indonesian Twitter Posts Using Long Short-Term Memory Networks,” in 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Oct. 2017. doi: 10.1109/ICACSIS.2017.8355038.
L. F. Simanjuntak, R. Mahendra, and E. Yulianti, “We Know You Are Living in Bali: Location Prediction of Twitter Users Using BERT Language Model,” Big Data and Cognitive Computing, vol. 6, no. 3, Sep. 2022, doi: 10.3390/bdcc6030077.
R. K. Behera, M. Jena, S. K. Rath, and S. Misra, “Co-LSTM Convolutional LSTM model for sentiment analysis in social big data,” Inf Process Manag, vol. 58, no. 1, 2021.
M. E. Basiri, S. Nemati, M. Abdar, S. Asadi, and U. R. Acharrya, “A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets,” Knowl Based Syst, vol. 228, Sep. 2021, doi: 10.1016/j.knosys.2021.107242.
W. Zhang and J. Gelernter, “Geocoding location expressions in Twitter messages: A preference learning method,” Journal of Spatial Information Science, vol. 9, Dec. 2014, doi: 10.5311/JOSIS.2014.9.170.
Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” in NIPS’19: Proceedings of the 33rd International Conference on Neural Information Processing Systems, Jun. 2019. [Online]. Available: http://arxiv.org/abs/1906.08237
Y. Zhou, J. Li, J. Chi, W. Tang, and Y. Zheng, “Set-CNN: A text convolutional neural network based on semantic extension for short text classification,” Knowl Based Syst, vol. 257, Dec. 2022, doi: 10.1016/j.knosys.2022.109948.
R. Haque, N. Islam, M. Tasneem, and A. K. Das, “Multi-class sentiment classification on Bengali social media comments using machine learning,” International Journal of Cognitive Computing in Engineering, vol. 4, pp. 21–35, Jun. 2023, doi: 10.1016/j.ijcce.2023.01.001.
M. Umer et al., “Impact of convolutional neural network and FastText embedding on text classification,” Multimed Tools Appl, vol. 82, no. 4, pp. 5569–5585, Feb. 2023, doi: 10.1007/s11042-022-13459-x.
H. Liang, X. Sun, Y. Sun, and Y. Gao, “Text feature extraction based on deep learning: a review,” Eurasip Journal on Wireless Communications and Networking, vol. 2017, no. 1. Springer International Publishing, Dec. 01, 2017. doi: 10.1186/s13638-017-0993-1.
J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J Big Data, vol. 6, no. 1, Dec. 2019, doi: 10.1186/s40537-019-0192-5.

No competing interests reported.

Transformer-based Information Extraction from Twitter Text on Complaint Monitoring System

Status:

Version 1

Abstract

Figures

Introduction

Related Work

Methodology

Data Preparation

Data Preprocessing

Data Annotation

Location Extraction

Complaint Type Classification

Data Visualization

Result and Discussion

Location Extraction Evaluation

Complaint Type Classification Model Evaluation

Data Visualization Evaluation

Conclusions

Declarations

References

Additional Declarations

Status:

Version 1