SpanMTL: a span-based multi-table labeling for aspect-oriented fine-grained opinion extraction

Aspect-oriented Fine-grained Opinion Extraction (AFOE) aims to extract the aspect terms, corresponding opinion terms and sentiment polarity in a target sentence. Most previous methods treat AFOE as word-level or span-level task, which ignore the complementarity of these two tasks. To integrate the merits of word-level and span-level information, we construct an end-to-end Span-based Multi-Table Labeling (SpanMTL) framework. SpanMTL combines word-based and span-based table labeling to tackle AFOE task. Specifically, in the proposed model, we use two separate BiLSTMs to encode the information of aspect and opinion terms into a word-based 2D representation table. Based on the table, we construct span-based table with CNN by associating the word-pair representations. At last, we integrate the table label distributions of word- and span-based table labeling to generate a multi-table labeling. The proposed method improves the performances of Opinion Pair Extraction (OPE) and Opinion Triplet Extraction (OTE) tasks by introducing span information, especially on the datasets with lots of spans. We have conducted various experiments on AFOE datasets to validate our method. The experimental results show that our method outperforms other baselines when the sentences having lots of span information.


Introduction
Aspect-based sentiment analysis (ABSA) [1][2][3][4] is a fine-grained task in sentiment analysis [5][6][7] that mainly focuses on aspect terms, opinion terms and sentiment polarity to obtain word-level or span-level sentiment information in sentences. Aspect terms are words or spans that describe the entities in a sentence, such as "waiters" and "pasta" in the sentence in Fig. 1. Opinion terms are words or spans that can reflect the subjective attitudes of the related aspect terms, such as "friendly" and "not good" in the sentence in Fig. 1. Sentiment polarity is the emotion attribute classification (positive, neutral, negative) based on aspect terms and related opinion terms. For example, the sentiment polarity of the aspect term "waiters" in the example sentence is "positive".

Sentence
Waiters are friendly and the pasta is not good.  Fig. 1 An example of Aspect-oriented Fine-grained Opinion Extraction task.
As the ABSA task has received more and more attention from scholars, many sub-tasks around the above three elements have been proposed corresponding to the example shown in Fig. 1, which are explained as follows: • Aspect Term Extraction (ATermE): extracting aspect term(s); • Opinion Term Exaction (OTermE): extracting opinion term(s); • Aspect Term Sentiment Analysis (ATSA): predicting the sentiment polarity of the annotated aspect term(s); • Target-oriented Opinion Words Extraction (TOWE): extracting the opinion term(s) of annotated aspect term(s); • Opinion Pair Extraction (OPE): extracting the aspect term(s) and corresponding opinion term(s); • Opinion Triplet Extraction (OTE) or Aspect Sentiment Triplet Extraction (ASTE): extracting the aspect term(s) and related opinion term(s), and predicting the corresponding sentiment polarity; • Aspect-oriented Fine-grained Opinion Extraction (AFOE): extracting aspect-opinion pair(s) or aspect-opinion-sentiment triplet(s).
The basic subtasks of ABSA, such as ATermE and OTermE, cannot fully analyze the fine-grained sentiment information in sentences. Therefore, Peng et al. [8] and Zhao et al. [9] propose OTE task and OPE task respectively. And then, Wu et al. [10] combine these two tasks into an AFOE task and provide four AFOE datasets. Different from most of the previous methods, which are based on the perspective of co-extraction with joint models [8,11], the pipeline methods are built to first extract aspect terms and opinion terms, then pair them. This pipeline framework is intuitive, but it ignores the relationship between aspect terms and opinion terms, which might result in error propagation problem.
To overcome such drawbacks, many end-to-end approaches are proposed to solve OPE [9] and OTE [12][13][14][15][16] tasks. However, most of the previous methods only focus on solving OPE or OTE task separately. Although Wu et al. [10] present a novel labeling method, GTS, to solve OPE and OTE tasks simultaneously, GTS is not good at extracting pairs or triplets containing spans. SpanMlt [9] pays attention to span extraction in solving OPE task. However, most existing methods define OPE and OTE as word-level tasks, ignoring the information of spans. The interaction between spans is ignored in the span encoding stage. In addition, words and spans are not distinguished in the final extraction stage. These result in suboptimal results of AFOE when many aspect terms and opinion terms in a sentence are spans.
In this paper, we propose a Span-based Multi- Table Labeling method, called SpanMTL, which aims to extract aspect-opinion pair(s) or aspectopinion-sentiment triplet(s) in a sentence through a 2D table labeling scheme on word and span tables. For word encoding, we use two separate BiLSTMs to simultaneously obtain aspect terms information and opinion terms information. Then we build a word-based 2D representation table, which is composed of aspect and opinion word representations. For table encoding, we use multidimensional recurrent neural network (MDRNN) [17] to learn the interactive information between elements in 2D table to obtain a contextual pair representation. For span encoding, we use the pair representation learned from the 2D word table to obtain the span pair representation through a convolutional neural network (CNN), and obtain a span-based 2D table. For decoding, we use table labeling method [10] to label the word-and span-based 2D tables. Finally, we integrate the labeling results of both tables and take the average value as the final result. We have conducted various experiments on four benchmark datasets to compare our SpanMTL model with other baseline models. The experimental results show that our proposed model outperforms other stateof-the-art methods on the 15res and 16res datasets, which have lots spans. The main contributions of our method are summarized as follows: • We generate word representation learned from two separate BiLSTMs, which can encode words into aspect-and opinion-specific word representations. • We design a CNN based span encoder to convert the word pair representation into span pair representation to extract span information. • Our method can employ both the word-and span-level information to improve the performance of AFOE task.

Related work
Aspect-based sentiment analysis (ABSA) was first proposed by Hu and Liu [1] and defined as an aspect-level task, which aims to mine the emotion information of the aspect term in a sentence. ABSA is also called fine-grained opinion mining, which mainly includes aspect term extraction (ATermE) [18][19][20], opinion term extraction (OTermE) [21], aspect term sentiment analysis (ATSA) [22][23][24][25], target-oriented opinion words extraction (TOWE) [26], etc. These subtasks dig out different information, but they do not integrate these useful information into one task. In order to extract comprehensive aspect-level information from unmarked data, opinion pair extraction (OPE) task and opinion triplet extraction (OTE) task are proposed. For OPE task, Zhao et al. [9] propose a span-based multitask end-to-end model, which first obtains representations of all words and spans and then predicts their labels and relationships. Chen et al. [16] and Mao et al. [15] transform OPE task into a machine reading comprehension task. Zhang et al. [12] tackle OPE task by a multi-task learning framework. For OTE task, Peng et al. [8] first propose a two-step method. The first step is to extract aspect terms and sentiment polarity through a unified labeling model, and then to extract opinion terms through a graph convolutional neural network. The second step is to use a binary classifier to determine all candidate triplets. From the perspective of first extracting and then judging, Huang et al. [11] also propose a novel two-step method. However, this two-step framework suffers from the problem of error propagation. To overcome this flaw, most recent works employ an end-to-end framework to solve the OTE task. Xu et al. [14] design a novel position-aware tagging scheme that is capable of jointly extracting the triplets.
In order to comprehensively evaluate OPE task and OTE task, Wu et al. [10] collectively refer to these two tasks as the aspect-oriented fine-grained opinion extraction (AFOE) and propose a novel tagging scheme, Grid Tagging Scheme (GTS). This method can mark the relationship between all word pairs in a word-based table. However, it cannot effectively extract aspect terms or opinion terms when these terms are spans. In addition, using one word encoder to obtain both aspect and opinion word representation means that the difference between aspect and opinion words is not considered. In contrast, we propose a Span-based Multi-Table Labeling method (SpanMTL), which can explore both information of word-and span-level information to improve the performance of AFOE task.

Problem Definition
Given a sentence C = (w 1 , w 2 , . . . , w c ), where c is the number of words. The goal of opinion pair extraction (OPE) is to extract a set of aspect-opinion (A-O) pair(s) in C: where a j , o j is the j-th aspect-opinion pair, a is the aspect term, o is the corresponding opinion term, and P is the number of opinion pairs. The goal of opinion triplet extraction (OTE) is to extract a set of aspect-opinion- where a j , o j , s j is the j-th aspect-opinion-sentiment triplet, a is the aspect term, o is the corresponding opinion term, s is the sentiment polarity and s ∈ {P ositive, N eutral, N egative}, T is the number of triplets. Note that there are more than one aspect term and opinion term in a sentence, an aspect term can be related to more than one opinion term, and the same opinion term may also correspond to multiple aspect terms, resulting in more than one A-O pair and A-O-S triplet in a sentence.

Table Labeling Scheme
We address the task of AFOE based on GTS [10], which utilizes two sets of target labels Y P = {A, O, P, N } and Y T = {A, O, P os, N eu, N eg, N } to represent the relation of any word-pair (w i , w j ) or span pair (s m , s n ) in a sentence for the OPE and OTE task, respectively. Label "A" and "O" represent Aspect and Opinion respectively, which mean that the current word-pair or span-pair is an aspect term or opinion term. Label "P " indicates that the wordpair or span-pair is an A-O pair for OPE task. Label "P os", "N eu" and "N eg" represent that the word-pair or span-pair is an A-O-S triplet with a specific sentiment polarity for OTE task. While label "N " represents Other, indicating that the current word-pair or span-pair is irrelevant. Labeling examples of OPE and OTE task is shown in Fig. 2. Note that in order to make the extracted information more obvious, label "N " is not marked in Fig. 2.
After obtaining the final labels in the table, we extract the A-O pair(s) or A-O-S triplet(s). First, if the pair formed by a word itself or all pairs formed by each word in a span in the table are marked as label "A" or "O", then this word or span is extracted as an aspect term or opinion term. Second, after obtaining all the aspect terms and opinion terms, if a word-pair composed of aspect term and opinion term is marked as "P " or any sentiment polarity label ("P os", "N eu", "N eg"), then this aspect term and opinion term are combined into an A-O pair or A-O-S triplet. Waiters are friendly and the pasta is not good Fig. 2 Labeling examples for the OPE and OTE task. The lower triangular grid is OPE task and the upper triangular grid is OTE task.

Model Description
In this work, we propose an Span-based Multiple Table Labeling (SpanMTL) model to solve aspect-oriented fine-grained opinion extraction task. As shown in Fig. 3, our method mainly consists of three components: (1) word encoding and word-based table encoding, in which word pairs are composed of the embeddings of aspect words and opinion words learned by two independent word encoders, and then the influence of pairs and adjacent elements are captured by using a

Word Encoding
In order to capture the semantic information of aspect words and opinion words in a sentence respectively, we design an aspect-specific word encoder and an opinion-specific word encoder. Each encoder is composed of a bidirectional Long Short-Term Memory (BiLSTM) layer and an attention layer. The inputs are a sentence with multiple words: C = (w 1 , w 2 , . . . , w c ), where c is the number of words. We first encode all words in the sentence through BiLSTM layers separately to obtain the aspect-specific word representations h a and opinionspecific word representations h o with sequence features: where − → h i is the forward hidden state and ← − h i is the backward hidden state. Then we use the attention mechanism to focus on obtaining the aspect and opinion words information in the sentence and get the word representation r a and r o .

Word-based Table Encoding
Different with GTS [10] that obtains the word pair representation from a single encoder, we use the word representation learned by two independent encoders to form the word pair. Then, we build a 2D word-based table T word , which is a matrix with dimension c*c and contains all possible word pairs. The pair representation of the i-th word as the aspect word and the j-th word as the opinion word is defined as follows: After that, we design a table encoder to contextualize the current element through surrounding elements in two directions, i.e. upper and left. The table encoder uses a Multi-Dimensional Recurrent Neural Network (MDRNN) [17] with Gated Recurrent Unit (GRU) [27] to enrich the 2D table elements representation T word i,j , and obtain the interacted table M word . Thus, the new representation of word pairs is calculated by: where T word i−1,j and T word i,j−1 are the upper and left element of the current pair T word i,j in 2D table, respectively.

Span Encoding
In order to solve the problem that existing methods cannot effectively obtain and extract span information, we design a span encoder composed of a convolutional neural network (CNN), which aims to convert the word pair representation in the word-based table into a span pair representation T span m,n : where M word i−1,j and M word i,j+1 are the upper and right word pairs of the current position. As shown in Fig. 4, this setting allows more comprehensive word-level information for span pairs.

Span-based Table Encoding
After obtaining the representation of span pairs, the table encoder with MDRNN is used to model the interactions between different span pairs and obtain the interacted table M span . The pair in m-th row and n-th column is defined as: M span m,n = GRU T span m,n , T span m−1,n , T span m,n−1 , andŷ span m,n as follows: y span m,n = Softmax W span M span m,n + b span .

(10) The cross entropy loss of A-O pair or A-O-S triplet prediction for wordbased table and span-based table is:
where y i,j and y m,n are the ground truth distribution of A-O pair or A-O-S triplet, I(·) is the indicator function. Y is the label set (Y P or Y T ) which is mentioned in Subsection 3.2, Y P for OPE task and Y T for OTE task.
To integrate the labels of the word-based table and span-based table, where i = m + 1, j = n and m, n ∈ (1, c − 1), i, j ∈ (1, c).
The final loss of our model for a sentence C is a weighted sum of L word and L span with L2-regularization term as follows: 4 Experiments

Datasets and Experimental Setting
We utilize four AFOE datasets built by Wu et al. [10] which are aligned by TOWE datasets [26] and SemEval Challenge datasets [2][3][4] to comprehensively evaluate the effectiveness of our proposed model. The summary statistics for the AFOE datasets are shown in Table 1. Precision, recall and F1 score are used as the metrics for evaluation. For the word encoder, we use pre-trained word vectors with double embeddings of DE-CNN [20] which is composed of 300-dimensional GloVe [28] embeddings and 100-dimension fastText [29] embeddings. The dimension of LSTM cell is set to 50 and the kernel size of CNN on span encoder embedding is 2. For the table encoder, we use MDRNN with GRU. For training, we use Adam [30] to optimize and set the learning rate to 0.005, batch size to 32 and dropout rate to 0.5. Our model is conducted on four testing sets and the average result of five runs is reported.

Comparative Methods
Since there are less existing works on AFOE, we compare our model with 12 combined models to evaluate the performance. Note that some of the comparative methods are designed only for OPE task or OTE task.
• CMLA+Dis-BiLSTM. CMLA [31] is an end-to-end model with a coupled multi-layer attention network. Dis-BiLSTM is built by BiLSTM network for relation extraction. CMLA+Dis-BiLSTM uses CMLA to co-extract aspect terms and opinion terms, and uses Dis-BiLSTM to pair them. • CMLA+C-GCN. C-GCN [32] is an extension of graph convolutional network (GCN) which is tailored for relation extraction. CMLA+C-GCN uses CMLA [31] to co-extract aspect terms and opinion terms, and pairs them by C-GCN. • RINANTE+C-GCN. This model uses RINANTE [33] model to coextract aspect terms and opinion terms, and then pairs them by C-GCN [32].  [10]. The model is an end-to-end table labeling approach based on Grid Tagging Scheme (GTS). Specifically, it firstly uses BiLSTM word encoder to generate the representation of each word, and then designs the inference strategy in GTS to exploit indications between all pairs. • GTS-CNN [10]. Different from GTS-BiLSTM model, this model uses CNN in the encoding layer. • Li-unified-R+PD. The model firstly extracts aspect terms and opinion terms, and predicts sentiment polarity by a unified tagging scheme proposed by [35]. Then it detects the relation by the Pair relation Detection (PD) proposed by [8]. • Peng-unified-R+PD [8]. The model is a two-step model, which firstly extracts aspect terms and predicts sentiment of aspect, and then encodes all candidate pairs for the final classification. • Peng-unified-R+IOG. The model first extracts aspect terms and predicts sentiment of aspect, and then pairs them by a binary classification with softmax layer. Peng-unified-R+IOG uses the first step of Peng et al. [8] to extract the (aspect, sentiment) pair, then combines them with IOG [26]. • IMN+IOG. The model uses IMN [36] model to solve the task of aspect term and opinion term co-extraction and aspect-level sentiment classification, and then uses IOG [26] model to pair the extracted terms.

Result Analysis
We compare the results of our model with the baseline models on four datasets. Since the AFOE task is divided into two parts, i.e. OPE and OTE, the experimental results are shown in Table 2 and Table 3. The best and second-best results are respectively in bold and underline.Gain reports the relative improvements between SpanMTL and the best/second-best performance results.
For the opinion pair extraction (OPE) task, the F1-score of SpanMTL model increases to 71.84%, 67.87% and 72.67% on datasets 14res, 15res and 16res compared with the second-best being 71.74%, 65.39% and 71.42%. This improvement indicates that our model focusing on span extraction and the impact information between pairs is very helpful to the OPE task. However, it is observed that the experimental result of our model on 14lap dataset is the second-best, lower than the best GTS-CNN model with 0.86% F1-score decline. This maybe that 14lap dataset belongs to the domain of lamps instead of the restaurant domain of the other three datasets. Compared with the OPE task, the opinion triplet extraction (OTE) task not only needs to extract the A-O pair, but also needs to predict the sentiment polarity of the aspect term. For the OTE task, the F1-scores of our model increase by 0.81% and 2.33% on 15res and 16res datasets compared with the second-best values, while decrease by 0.45% and 0.48% on 14lap and 14res datasets compared with the best values. This slight decrease on 14res and 14res datasets could be caused by the unnecessary non-phrase information brought in the representation of all spans in the input sentence.
In summary, due to the learning of span information and the table encoding, our model can effectively extract A-O pairs and A-O-S triplets. However, our model has a little dependence on the context domain, resulting in lower results in the dataset 14lap, which is also the direction to be improved in our future work.

Results of ATermE and OTermE task
To further analyze the effects of different methods, we compare the performance of two subtasks, Aspect Term Extraction (ATermE) and Opinion Term Extraction (OTermE). In order to ensure the fairness of the results, we follow the best method [10] and compare the results on res14 and res15 datasets.
The results of ATermE and OTermE are shown in Table 4. On one hand, it can be observed that the F1 scores of SpanMTL on ATermE and OTermE are both improved on res15 dataset, indicating that SpanMTL is also helpful to the subtasks in solving AFOE task. On the other hand, the result of ATermE task is better than that of OTermE. This maybe because that the ratio of spanbased aspect terms is more than the ratio of span-based opinion terms, and our method is designed to solve the problem of span labeling. This illustrates the effectiveness of SpanMTL in solving subtasks ATermE and OTermE.

Ablation Study
In order to further verify the performance of the table encoder and span processing method under our full model, we conduct an ablation study. All results are shown in Table 5 and Table 6, where "SpanMTL w/o table" is a variant   Method  Removing the table encoding process results in performance degradation, indicating that table encoding between aspect-opinion pairs or triplets is useful for prediction. Similarly, the F1-score of "SpanMTL w/o span" is also decreased, which further shows that our emphasis on span is correct, and the designed span-based representation learning and span-based table encoding can improve the AFOE task.

Case Study
To further illustrate the performance of our module, we present a case study on OPE task comparing our model with exhaustive model GTS [10]. As shown in Table 7, Sentence #1 is selected to evaluate the effect of models on the span extraction, and Sentence #2 is selected to evaluate the effect of models on the span extraction and multiple A-O pairs extraction. For Sentence #1, our model can correctly extract the aspect term and A-O pair, while GTS model only extracts the first word of the aspect term. This proves that our improved table labeling method can effectively extract phrases. For Sentence #2, our model successfully extracts the opinion word "great surprises", but GTS model only extracts the word "great". In our analysis, this result may be caused by the ignorance of span information in the GTS tagging method. It is worth noting that there are three A-O pairs in Sentence #2, but both of our model and GTS model only extract two pairs. This shows that our method has insufficient processing effect on multiple A-O pairs extraction, which will be the research direction of our future work.

Conclusions
In this paper, we propose a Span-based Multi-Table Labeling framework, SpanMTL, for aspect-oriented fined-grained opinion extraction (AFOE) task. Our method encodes all words in each sentence through two BiLSTMs to obtain aspect-and opinion-specific word representations. Then we obtain word-based table encoding with a multi-dimensional recurrent neural network (MDRNN). Based on the word-based table, we generate span encoding with CNN, which is further processed by another MDRNN to generate spanbased table encoding. The final extractions are obtained by integrating the results of word-based table labeling and span-based table labeling. Experimental results show that our model outperforms the state-of-the-art methods on res15 and res16 datasets, which have lots of span information. In our future work, we will explore the relationship between multiple pairs or triplets in a sentence to improve the performance of AFOE task.