A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews

New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of point mutual information and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis.


Introduction
With the application and development of e-commerce on the Internet, a critical mass of users tends to post product reviews on shopping platforms. Product reviews can provide consumers or companies with a wealth of information, including objective product descriptions, accurate data statistics and product popularity (Zhao et al. 2018;Singh and Sarraf 2020;Bi et al. 2019;Pankaj and Muskan 2019). It has great practical value to perform sentiment analysis on product reviews accurately and effectively. As a useful prior knowledge, sentiment words can pave the way for subsequent sentiment analysis. Since the sentiment word is the basic language unit for people to express opinions or attitudes, the extraction of new sentiment words is undoubtedly a crucial field.
For the previous work on sentiment word extraction Zhu et al. 2020), in (Zhang and Wei 2018), a method for constructing microblog sentiment dictionary is proposed, and the sentiment analysis of microblog texts is achieved. In (Zhu and Pan 2020), two coefficients (i.e., microblog importance and time decay) are combined to extract the highlighted words, and the correlation strength between any two highlighted words is measured via the compound co-occurrence rates. Different from the previous work, the process of extracting new sentiment words is gradually fine-grained. According to the framework of this paper, we can summarize the extraction task as follows: dig out a set of candidate new sentiment words at a coarse-grained level and then filter out new sentiment words at a fine-grained level. Considering the problem that traditional methods generally ignore the context and syntactic information, a novel approach to extracting new sentiment words from product reviews based on sequence labeling and syntactic analysis is proposed.
At present, to evaluate whether the approach to extracting new sentiment words is effective, the following two aspects need to be considered. On the one side, more new sentiment words are retained in the process of extraction. On the other sides, it is ensured that the extracted words have a clear sentiment polarity. For the method, in this paper, we retain more new sentiment words in the process of coarse-grained extraction and supplemental extraction and ensure that the extracted words have the sentiment polarity in fine-grained filtering.
To overcome the above issue, a new data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews is proposed. The process of extracting new sentiment words is divided into three main steps: coarse-grained extraction, supplemental extraction and fine-grained extraction. The system framework is shown in Fig. 1.
• Coarse-grained extraction Two kinds of sequence labeling are performed on the pre-processing corpus to conclude the location rules of old sentiment words. The probability that the new word is a sentiment word is calculated by virtue of the location rules. Then, words that meet the set threshold are selected into the candidate set of new sentiment words.
• Supplemental extraction Syntax trees are generated from product review texts by means of syntactic analysis, and the syntax tree is traversed to generate strings. The edit distance between strings is utilized to measure the similarity of syntactic structure. Furthermore, the candidate new sentiment words are extracted by the method of matching appositive words based on edit distance.
• Fine-grained filtering The final set of new sentiment words is collected by fine-grained filtering, which includes the calculation of point mutual information (PMI) and difference coefficient of positive and negative corpus (DC-PNC). Then, the sentiment polarity of words is classified into positive and negative, respectively.
The main contributions of our work can be summarized as the following three points: • This paper proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words, which can detect new sentiment words from product reviews effectively.
• This paper proposed a method of judging the sentiment polarity of words based on PMI and DC-PNC, which can determine the sentiment polarity of words accurately.

•
The new sentiment words extracted by the proposed method are applied to sentiment analysis, and good experimental results are obtained in several datasets, thus verifying the effectiveness of the proposed method.
The rest of this paper is organized as follows. The related works are introduced in Sect. 2. The specific process of coarse-grained extraction and supplemental extraction of candidate new sentiment words is described in Sect. 3. The fine-grained filtering of new sentiment words is discussed in Sect. 4. The experimental design and analysis are explained in Sect. 5. The conclusion of full text and the outlook for future work are summarized in Sect. 6.

Related works
The goal of extracting new sentiment words is to identify new sentiment units in the process of data processing, so that the subsequent sentiment analysis can be performed more accurately and effectively. In this section, we briefly review the related work from two perspectives, the recognition of new words and the judgment of sentiment polarity.

New word recognition
Regarding the method of new word recognition, Li et al. proposed a DWWP system and used the combined mutual information technology to solve the user's invention of new words and conversion of sentimental words . Sarna et al. (2016) applied probabilistic methods to identify new keywords and assign groups correspondingly and make decisions based on existing keywords and new keywords extracted. He et al. (2019) associated the word co-occurrence probability with the words similarity and assumed that the most semantically different words are potential candidates for the anchor words. Yan et al. (2017) proposed an iterative method to extract new words, through which it was possible to extract distinguishable seed context patterns. Li et al. (2016) took new word recognition as a binary classification task and proposed a new effective classification feature including word embedding, activation distance, and statistical conversion probability. Lee et al. (2019) regarded mutual information and entropy as a basis for an algorithm and identified unknown words from multilingual code-switching sentences. Yan et al. (2017) proposed an iterative scheme to extract new words and introduced dynamic features that characterize the similarity of context patterns.

Sentiment polarity judgment
Regarding the method of judging the polarity of sentiment words, Darwich et al. (2020) overcomed the inherent problems of dictionary-based generation models and derived the sentiment polarity of term senses by the context-dual-step aware in-gloss matching. Li et al. (2017) performed word embedding based on a set of seed words and inferred multi-dimensional affective representation of words by a regression-based method automatically. Basiri et al. (2020) considered the part-of-speech tags, specified potential terms and employed a comprehensive sentiment lexicon to compute the polarity of the sentences. Wu et al. (2017) proposed a new method of merging specific sentiment classifiers in the field of multi-source emotional knowledge training, extracting emotional information from four information sources. Deng et al. (2019a) proposed a novel hierarchical supervision topic model which can capture the sentiment polarity of each word in different topics under the hierarchical supervision. Wu et al. (2019) proposed a sentiment classification task of words and classified the sentiment of words according to the hidden representation of words in sentences. Zhao et al. applied sentiment-oriented point mutual information (SO-PMI) to judge the sentiment polarity of sentiment words and calculated the emotional intensity of sentiment words (2016). Lee et al. (2018) utilized association rule mining technology to extract words that have the sentiment polarity. Beigi et al. (2020) proposed a novel approach to constructed domain-specific sentiment lexicon, in which the combination of neural network and a sentiment lexicon can adapt word polarities to the target domain without supervision. Deng et al. (2019b) trained a classifier to predict the sentiment polarity of words, which chooses sentiment-aware word embedding as features.
Based on the existing research, in this paper, the process of extracting new sentiment words is regarded as a gradually refined process. Firstly extract candidate new sentiment words at a coarse-grained level and then filter candidate new sentiment words with fine-grained. It is found that the product review corpus has the following characteristics: (1) the syntactic structure of product review texts is highly similar; (2) new sentiment words often appear around product names, product attributes or four parts-of-speech words (adjectives, adverbs, nouns and verbs). In this paper, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews, and also newly defined the concept of DC-PNC to judge the sentiment polarity. The method improved the extraction effect of new sentiment words. To some extent, it solved the problems of unobvious polarity and low accuracy of the extracted sentiment words.

Coarse-grained and supplemental extraction of candidate new sentiment words
A product review corpus is constructed by crawling four kinds of product reviews from the JD Mall platform, including "computer reviews," "Laundry detergent reviews," "drawing board reviews" and "tracksuit reviews." After the pre-processing of product review texts, two steps of extracting candidate new sentiment words are conducted, including coarse-grained extraction and supplemental extraction.

The pre-processing of product review texts
The pre-processing of the raw product review corpus is shown as Algorithm 1.
(1) Normalized processing: (1) Removing some ''garbage'' comments, including texts that are not related to the product and texts containing slogans or improper intent. (2) Chinese word segmentation: using the word segmentation tool ICTCLSA to segment the product review corpus. (3) Contrast deduplication: the results of word segmentation are combined with old sentiment words for comparison and deduplication in order to obtain a set of new words.

Coarse-grained extraction of candidate new sentiment words
Based on a large amount of corpus statistics, this rule can be obtained. If the context of a word is similar to the context of another old sentiment word, the possibility that the word is a sentiment word will increase. So the main idea of this step is to obtain the location rules of old sentiment words firstly and then take advantage of the location rules to extract new sentiment words. Specifically, that is to count the frequency of old sentiment words appeared around two kinds of labels and calculate the probability that the new word appeared around two kinds of labels is a sentiment word and then extract words that meet the set threshold as candidate new sentiment words.
There are four main steps in this part of work, which includes sequence labeling, concluding the location rules, calculating the joint probability value, and selecting candidate new sentiment words according to the probability value.

Sequence labeling
Sequence labeling problems (Sun et al. 2020Lin et al. 2020) in natural language processing include word segmentation, part-of-speech (POS) tagging, named entity recognition (Chen et al. 2020;Wang et al. 2018Wang et al. , 2019, keyword extraction, etc. As long as a specific label set is given, sequence labeling can be performed. Sequence labeling means that for an input sequence: . . .; x i ; . . .; x n , x i in the input sequence X is labeled with a certain label, then the sequence is output: Y ¼ y 1 ; y 2 ; y 3 ; . . .; y i ; . . .; y n .
According to the experience of language expression and the rules of part-of-speech collocation, we find that four parts-of-speech words, product names and product attributes often modify sentiment words. Consequently, we choose them as labels of sequence labeling. Here, both product names and product attributes are collectively called "subject words." Table 1 explained a brief description of sequence labeling, and Fig. 2 shows a brief schematic diagram of sequence labeling in this paper.
Based on two types of labels, two kinds of sequence labeling tasks need to be performed, including subject words tagging and part-of-speech tagging. Subject words tagging is to mark specific categories of entities, including product names and product attributes. Here, we manually constructed a collection of subject words for labeling. Partof-speech tagging Pota et al. 2019) is to mark the part-of-speech of each word, which can also be used to analyze the syntactic structure of a sentence. Harbin Institute of Technology's NLP toolkit is utilized to part-of-speech tagging.

Concluding the location rules
According to the output sequence in the previous step, the statistics-based method is utilized to calculate the frequency of the old sentiment words appeared around two kinds of labels, respectively (the following "around" means "within 4 characters"). Here, the distribution of the old sentiment words around two kinds of labels is called "location rules." To a certain extent, the location rules of old sentiment words reflect the context in which sentiment words often appear. Therefore, the purpose of this step is to utilize the location rules of old sentiment words to pave the way for mining new sentiment words.
The ratio of old sentiment words around the two types of labels is, respectively, PðaÞ and Pðb i Þ.The formula is shown in (1), (2).
where t a and t bi represent the frequency of old sentiment words appeared around subject words and four parts-ofspeech words (i=1, 2, 3, 4 represent adjectives, nouns, adverbs and verbs, respectively). T is the total number of times that old sentiment words appeared in the raw product review corpus.

Calculating the joint probability of new words
The probability of old sentiment words appeared around subject words and four parts-of-speech words is PðaÞ and Pðb i Þ. Therefore, the probability of new words appeared around subject words and four parts-of-speech words being sentiment words is also set as PðaÞ and Pðb i Þ. Each new word may appear around subject words, or four parts-ofspeech words or both. Thus, the "weighted summation" strategy is adopted to set the following formula. The formula aimed to calculate the joint probability of new words being sentiment words. The calculation formula of the joint probability is shown in (3), and the weights in the probability formula are shown in (4) and (5). A data processing method based on sequence labeling and syntactic analysis for extracting new… 857 where PðaÞ and Pðb i Þ represent the probability that the new word is a sentiment word when it appears around the subject words and four parts-of-speech words, respectively. t a and t bi represent the frequency of old sentiment words appeared around subject words and four parts-of-speech words.

Selecting candidate new sentiment words
The greater the joint probability of a word, the more likely it is to be a new sentiment word. Therefore, the goal of this step is to extract words with higher joint probability. Then, the joint probability calculated in the previous step is compared with the set threshold. If the joint probability value exceeds the set threshold, it is added to the candidate set of new sentiment words. Otherwise, the word is removed. Algorithm 2 shows the procedure for extracting new sentiment words with coarse-grained according to the results of sequence labeling.
Algorithm 2 is comprised of four parts. The first part (Step 1-5), respectively, annotates four parts-of-speech words, product names, product attributes and old sentiment words in the product review corpus after pre-processing. The second part (Step 6-9) is to conclude the location rules of old sentiment words. If an old sentiment word appears within 4 characters of four part-of-speech words, product names or product attributes increase the corresponding frequency by one. The probability of old sentiment words appeared around two kinds of labels is calculated by formulas. The third part (Step10-13) calculates the joint probability that the new word appeared around two kinds of labels is a sentiment word, according to the location rules of old sentiment words obtained in the previous step. In the fourth part (Step14-17), add the words that meet the set threshold to the candidate set of new sentiment words.

Supplemental extraction of candidate new sentiment words
If the syntactic structure of a word is similar to the syntactic structure of another old sentiment word, the possibility of the word being a sentiment word will increase. The syntax tree is a graphical representation of sentence structure, which is helpful to understand the syntactic structure of words. So the main idea of this step is to find candidate new sentiment words with the help of syntax tree and syntactic structure similarity. In this step, we introduced the concept of ''appositive words'' and proposed the method of matching appositive words based on edit distance to extract candidate new sentiment words.
Definition 1: Appositive words Appositive words are words that occupy the same position component in sentences with similar syntactic structures. They belong to the same category concept, and their meanings are equal and irreplaceable. The appositive word is defined by formula (6): In formula (6), N is the product review corpus after preprocessing, ED is edit distance between strings, which is utilized to measure the structure similarity, k is the set threshold.
When the ED is smaller than the threshold value k, it is deemed that these sentences have similar syntactic structure, and the word occupied the same position with old sentiment word is considered to be a candidate new sentiment word. For example, ''精美(exquisite)'' in ''这份礼 物很精美(this gift is exquisite)'' and ''上头(a momentary impulse)'' in ''这种味道很上头(this taste is very high)'' are appositive words with each other.
The process of supplemental extraction of new sentiment words is shown in Fig. 3. The syntax tree of product reviews is generated based on the technology of syntactic analysis. Syntactic analysis can reflect the semantic modification relationship between sentence components, which can obtain long-distance collocation information. The traversal path from the root node to the node of old sentiment word can generate a string, which reflects the syntactic structure component in which it is located. The edit distance (ED) refers to the times of editing operations required to convert two strings from one to the other, which can be utilized to measure the similarity of two strings. Similarly, the edit distance of traversal path strings can be used to measure the similarity of syntactic structure. The smaller the edit distance, the more similar the syntactic structure. Hence, a new method of matching appositive words based on edit distance is proposed, which can be applied to extract candidate new sentiment words. The specific steps of the method of matching appositive words are as follows.
Step1: Generate a syntax tree All of the product reviews are split into sentences firstly, which is the basis of constructing a syntax tree. Stanford University's natural processing toolkit Stanza (Peng et al. 2020) is utilized to perform syntactic analysis on sentences. Then, the structured information of the syntax tree of each sentence is obtained by the software package.
Step2: Establish a syntactic structure matching template A syntax tree structure table is created as a matching template. The table stores multiple common string representations (e.g., ROOT-IP-VP-ADVP-AD-VP-VA), which is the traversal path of old sentiment words in the syntax tree corresponding to the comment text. The string reflects the syntactic structure component in which the word is located. Words gained by list S1 Words gained by list Se Fig. 3 The process of supplemental extraction of new sentiment words A data processing method based on sequence labeling and syntactic analysis for extracting new… 859 Step3: Calculate the edit distance (ED) Traverse the syntax tree of each sentence and generate the subtree. For each subtree generated, the string S is also generated by traversal. Calculate the edit distance (ED) between the string S of the subtree and the existing string in the matching template. When the edit distance is less than the threshold value, it is deemed that the syntactic structure of two strings is similar. Otherwise, the traversal string S of the subtree is added to the matching template.
Step4: Categorize clauses with similar syntactic structure Build a result table and create multiple new keys in the result table. Below the column corresponding to each key value, a list of clauses with similar syntactic structure is stored.
Step5: Extract candidate new sentiment words The sentences in the same list of clauses are aligned according to the syntactic structure. The words occupying the same position as the old sentiment words are regarded as candidate new sentiment words.
Step6: Remove old sentiment words After the above steps are completed, combining with the existing sentiment dictionary, remove the repeated words from the candidate set of new sentiment words.

Fine-grained filtering of candidate new sentiment words
The words extracted by the above steps may have no sentiment polarity, so the judgment of sentiment polarity is still required. Hence, we proposed a new method of judging the sentiment polarity of words based on PMI and DC-PNC.

Point mutual information (PMI)
Point mutual information (PMI) is utilized to calculate the semantic similarity of two words. The larger the value of PMI, the higher the relevance of two words. The calculation formula is shown in (7).
where Pðword 1 &word 2 Þ represents the probability of two words appeared in a review at the same time, Pðword 1 Þ and Pðword 2 Þ represent the probability that word 1 and word 2 appear in reviews separately. The semantic similarity of candidate new sentiment words and commendatory benchmark words and derogatory benchmark words is calculated, respectively, and the sentiment polarity of words can be determined by the difference. The calculation formula is shown in (8).

SO PMIðwordÞ
where P wi is a set of commendatory benchmark words, N wi is a set of derogatory benchmark words.PMIðword; P wi Þ represents the semantic similarity of the candidate new sentiment words and commendatory benchmark words, PMIðword; N wi Þ represents the semantic similarity of the candidate new sentiment words and derogatory benchmark words. SO PMIðwordÞ represents the sentiment polarity of the word.

The difference coefficient of positive and negative corpus (DC-PNC)
In this part, we proposed a new method for judging the sentiment polarity of words. The sentiment polarity of candidate words is determined by the ratio of frequency difference and frequency sum in the positive and negative corpus, which is defined here as the difference coefficient of positive and negative corpus (DC-PNC). If the frequency of a word appeared in the positive corpus is high and this word rarely appears in the negative corpus, then we believe that the sentiment polarity of the word is positive. Otherwise, the opposite is true. The value of DC-PNC ranges from À1 to 1. The closer its absolute value is to 1, the more likely it is to have sentiment polarity. The specific definition is: where F pos ðwordÞ and F neg ðwordÞ represent the number of times that the word appeared in the positive corpus and the negative corpus, respectively.
where dðwordÞ represents the sentiment polarity of candidate new sentiment words. If dðwordÞ ¼ 1, the candidate new sentiment word is added into the positive set of new sentiment words. If dðwordÞ ¼ À1, the candidate new sentiment word is added into the negative set of new sentiment words. Otherwise, we believe that the candidate new sentiment word cannot be collected into the final set of new sentiment words.
In formula (10), there are certain underlying parameters that need to be tuned. In Sect. 5, the experiment of tuning parameters of a and b was performed with ten groups of parameter values. Since the best performance of 75.6% (Fmeasure) was obtained when a ¼ 0:8; b ¼ À0:8, this will be our choice for a and b.
Algorithm 3 shows the procedure for filtering new sentiment words based on PMI and DC-PNC.
Algorithm 3 is comprised of two parts. In the first part (Step 1-9

Experimental data
There are various e-commerce platforms where users can express comments on a product or service. The JD Mall website is used as the source for crawling product review data. Considering the diversity of products and the coverage of customers, four kinds of product review, including "computer review," "laundry detergent review," "drawing board review" and "tracksuit review," are collected to construct a product review corpus. In this paper, the product review corpus of 20,000 reviews is divided into two disjoint sets: Training Set and Test Set. The Training Set of 4000 reviews is used to extract new sentiment words, and the Test Set of 16,000 reviews is used for sentiment classification to verify the effectiveness of new sentiment words extracted by the method in this paper.
In the meantime, product reviews of Training Set and Test Set need to be marked in advance. Generally, sentiment classification consists of three sentiment polarities: positive, negative and neutral. Since comments only contain sentiment words with positive and negative polarity, in this paper, all comment texts are classified into two sentiment polarities: positive and negative. Comments under the ''praise review'' label are divided into the positive review data set, and comments under the ''bad review'' label are divided into the negative review data set. In addition, as for comments under the ''middle review'' label and other reviews, the labeling of sentiment polarity is completed manually. Table 2 and Table 3 show the distribution of Training Set and Test Set, respectively.

Experimental performance evaluation index
Precision (P), recall (R) and F-measure (F) are utilized as experimental performance evaluation indexes. The formulas are (11-13).
where j t represents the number of product reviews of the category judged correctly, j f represents the number of product reviews judged as the category, and j s represents the number of product reviews that should be judged as the category. The category includes positive and negative.

Experimental methods
Product review data provided by JD Mall shopping platform are applied to experiments. The specific steps of the experimental design are as follows: Step1: Constructing the product review dataset Product reviews were crawled from JD Mall by the crawler as experimental dataset, including ''computer review,'' ''laundry detergent review,'' ''drawing board review'' and ''tracksuit review.'' Step2: Pre-processing the product review corpus Corresponding to Algorithm 1, the product review corpus is normalized by removing some "garbage" comments, special symbols and stop words; the ICTCLSA Chinese word segmentation tool is utilized for word segmentation.
Step3: Extracting new sentiment words from the Training Set This step applied the new method proposed in this paper to extract new sentiment words from the Training Set, including coarse-grained extraction, supplemental extraction and fine-grained filtering.
Step4: Performing sentiment classification on the Test Set This step adopts the sentiment lexicon-based method to perform sentiment classification (Manek et al. 2017;Lu and Wu 2019;Zhang et al. 2021), which is to accumulate the weights of sentiment words appeared in the sentence and determine the inclination of opinions to be positive or negative according to the accumulated value.
Step5: Comparing the experimental results Based on the results of sentiment classification on the Test Set, a comparison was made with the previous labeled results on the Test Set to measure the accuracy.
To verify the validity of the method of extracting new sentiment words, we designed two kinds of experiments for comparison. Here, the sentiment dictionary of Dalian University of Technology is abbreviated as DUTIR, and the set of new sentiment words extracted from Step3 of the experiment in this paper is called NSW.
• Experiment I (DUTIR) makes use of the sentiment dictionary of Dalian University of Technology (DUTIR) to perform sentiment classification on product reviews in the Test Set. The precision, recall and F-measure are calculated, respectively.
• Experiment II (DUTIR?NSW) makes use of the new sentiment dictionary to perform sentiment classification on product reviews in the Test Set, which combines the sentiment dictionary of Dalian University of Technology (DUTIR) with the set of new sentiment words (NSW) extracted by the new method. The precision, recall and F-measure are calculated, respectively, again.

Experimental results and analysis
The distribution of positive and negative new sentiment words extracted from each product reviews of Training Set is shown in Fig. 4a. A total of 1851 words were extracted from the four kinds of product reviews, and 1311 new sentiment words were left after removing the repeated words. The overall proportion of positive and negative words in the set of new sentiment words is shown in Fig. 4b. It can be seen that the number of positive new sentiment words accounted for 0.679, and the number of negative new sentiment words accounted for 0.321. Table 4 lists some examples of new sentiment words extracted from Training Set. As can be seen from Table 4, most of new sentiment words are extracted correctly and most of words express obvious sentiment polarity, including some novel Internet buzzwords. The experiment result illustrates that the method in this paper can effectively extract new sentiment words.
Influence of a and b: In Sect. 4.2, a new method of judging the sentiment polarity of words is proposed. In formula (10), the judgment of DC-PNC involves the setting of parameters, which refers to a and b. Hence, combining with ten groups of parameters setting, the influence of parameter a and b on experiment can be revealed clearly in Table 5 and Fig. 5. Figure 5 shows the precision, recall and F-measure of the experiment with different a and b. Considering the three evaluation indicators, we can see that the best performance of 75.6% (F-measure) was obtained when a ¼ 0:8; b ¼ À0:8. When a ¼ 0:7; b ¼ À0:7, the recall rate is the best and more new sentiment words can be extracted, but the accuracy is second best. When a ¼ 1:0; b ¼ À1:0, the reason of the decline may be that there are too few sentiment words that meet the threshold condition and some words that are originally sentiment words are filtered out, which leads to the poor effect in the application of sentiment classification.
The experimental comparison results on positive comments of DataSet5 are shown in Fig. 6a, and the experimental comparison results on negative comments of DataSet5 are shown in Fig. 6b, including three experimental performance evaluation indexes: precision, recall and F-measure. Table 6 shows the experimental comparison results obtained by using two methods (DUTIR and DUTIR?NSW), respectively, on five datasets.
According to the comparison results of two methods in Fig. 6 and Table 6, it can be concluded that the method of extracting new sentiment words based on sequence labeling and syntactic analysis has a great performance on the extraction of new sentiment words from product reviews. Specific analysis can be made as follows.
(1) The validity of new sentiment words extracted by the method is proved As can be seen from Fig. 6a, b, compared with the results of Experiment I (DUTIR), the overall effect of experiment II (DUTIR?NSW) was improved after adding new sentiment words extracted by the proposed method. For positive comments in dataset5, the precision, recall rate and F-measure increased by 17.3%, 19.5% and 17.9%, respectively. For negative comments in dataset5, the precision, recall rate and F-measure increased by 5.9%, 5.5% and 5.7% respectively. (2) The new sentiment words extracted by the proposed method can improve the recall of sentiment analysis  The candidate word set was obtained with a hybrid of Algorithm 2 and the newly proposed method of matching appositive words based on edit distance. Multiple feature information was taken into account, such as word position, context and syntactic structure, which is helpful to extract potential new sentiment words more comprehensively. As can be seen from Table 6, the recall rate of the experiment has been improved to some extent. (3) The new sentiment words extracted by the proposed method can improve the accuracy of sentiment analysis PMI and DC-PNC in Algorithm 3 are combined to filter the candidate set of words, which aims to get the final set of new sentiment words. The combination solved the issue that the screening   precision of existing methods was not high to some extent. As can be seen from Table 6, the accuracy of applying new sentiment words extracted to sentiment classification has been improved.
In general, the experimental results indicate that in the field of product reviews, the data processing method based on sequence labeling and syntactic analysis can extract new sentiment words effectively, and provide effective help for sentiment analysis of product reviews.

Conclusions
In regard to the issue that traditional methods for extracting new sentiment words generally ignore the context and syntactic information, a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews is proposed. The method is mainly divided into three stages: coarse-grained extraction, supplemental extraction and fine-grained filtering. The main contributions of this paper can be summarized as three aspects.
(1) Subject words tagging and part-of-speech tagging are combined to label old sentiment words, which can capture the location rules of old sentiment words, and the location rules can be used to extract new sentiment words. This step considered the context of sentiment words, which can extract candidate new sentiment words at a coarse-grained level more accurately.
(2) This paper proposed a method of matching appositive words based on edit distance, which mainly uses the similarity of syntactic structure to extract more new sentiment words. This method solved the problem of ignoring structured syntactic information in traditional methods. Meanwhile, the scale of the set of new sentiment words has also been expanded.
(3) This paper introduced the new concept of the difference coefficient of positive and negative corpus (DC-PNC) to judge the sentiment polarity of words.
To a certain extent, the combination of PMI and DC-PNC improved the screening precision of new sentiment words.
In the later work, an unsupervised approach will be attempted to realize automatic recognition of new sentiment words, which aims to improve the effect of recognizing new sentiment words. As a related research work of edge computing, the extraction of new sentiment words based on product reviews will effectively utilize data resources to provide information service for users in the future, so as to maximize the value of data.