A tourism preference construction model based on complex network analysis

: The travel notes contain a wealth of tourists' behavior information, which provides a new way to study tourists' preferences. How to mine the text of online travel notes accurately and efficiently has become the key to research tourists' preferences. In this paper, the theory and technology of text mining were introduced into the research of tourists' preference through a large number of online travel notes accumulated on the Internet. The main research work of this paper was as follows: (1) The tourists' preference model was constructed by complex network method; (2) The travel notes data of Sanya tourists as an example was crawled and analyzed. In this paper, the theory of network travel data and text mining is introduced into the study of tourists' preferences, which not only improves the data quality of traditional preference research field, but also provides a new method for mastering tourists' preferences more accurately.

tourism and analyzed their advantages and disadvantages. (2)An effective a tourists' preferences models based on complex network for tourism destination management is proposed. (3)Performance analysis of the proposed model and validati on of tourists' preferences.
The rest of this paper is organized as follows. Section 2 discusses related works, followed by the model for tourists' preferences models based on complex network designed in Section 3. Section 4 shows the research results based on proposed model, and Section 5 concludes the paper with summary and research limitations.

Research progress of tourists' preferences
The research of foreign tourism preference has a wide range of contents, which not only involved the direct preference analysis of tourist destination and reception facilities preference, special group preference, but also involved the relationship between tourism preference and tourists' needs, gender, the choice of career of tourism majors, the work preference of tourism agencies, etc [1]. In terms of research methods, foreign scholars attach great importance to the application of quantitative techniques and methods for empirical analysis, in addition to the use of multiple logit models (MNL, multinomial logit model) [2], conditional logit model (CLM, conditional logit model) [3], nested logit model (NL, nested logit model) [4] and mixed logit model (ML, mixed logit model) [5]are used to analyze tourists' or residents' preferences for tourism, leisure, recreation and reception facilities. And some researchers also maked full use of the multidimensional scaling approach [6], the TAT (thematic apperception test) [7], and fuzzy set theory. And The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) [8], and decision tree analysis [9].Such as analyzing the preferences of tourists or residents on tourism, leisure and recreation, and reception facilities.In China, the research techniques and methods of tourism preference were mainly through questionnaire survey, and then the data obtained were quantified in the form of histogram or pie chart plus statistical data analysis to solve the problem. Based on the market survey data of American tourists in six hot tourist cities in China, Ma Yaofeng et al. researched the law of preference of American tourists to China [10]. Through a questionnaire survey, Wu Bihu et al found the recreational behavior and preference characteristics of local residents around Hangzhou [11]. Taking Xi'an inbound tourists as an example, Bai Kai et al researched tourists' shopping preferences and behavior rules [12]. Chen Nan et al analyzed the relationship between tourism risk perception and tourism behavior preference through empirical investigation [13].Sun Gennian established a regression model to reflect international tourism payments, revealing the correspondence between tourism payments at various levels and GNP per capita [14]. Tian Xiangli et al. found the tourists' cognitive and selection preferences in the Qinling Scenic Spot in Shaanxi based on the perspective of differences in the source groups [15].

Research on the application of complex network in Tourism
Network analysis and tourism were both young subjects, and the application of network theory and analysis methods in tourism research started late. Among them, foreign scholars applied network analysis to tourism research earlier, and first started to study the relationship between recreational resources and users in the 1980s. Since then, with the in-depth development of network theory, network analysis method has been paid attention to in tourism research and has been applied by more and more scholars [16]. After entering the 21st century, the increasingly complex interaction between various factors in the tourism system further promotes researchers to perspective tourism phenomenon from the perspective of social network, which maked network analysis becoming a hot field in tourism research. Scholars at home and abroad began to pay attention to the application of network science, especially the social network analysis method in tourism research. From the perspective of the specific research progress of social network in the field of tourism, the research content mainly focuses on the analysis of tourism destination government [17], tourism policy network [18], tourism cooperation research [19,20],characteristics of tourism route network [21,22], tourists' behavior [23], tourism enterprises [24,25], other related issues [26,27].
Generally speaking, the application of complex network analysis in the field of tourism mainly focused on two aspects: one was to use the network analysis method to research the development of business network and the internal relationship between organizations; the other was to apply it to tourism policy, including the public-private relationship among all stakeholders and the structure of Tourism governance. The scope of research is still expanding. The types of network structure included communication network, virtual network and cooperative network. At the same time, network measurement methods also used density method, structural hole, strong / weak tie method, clustering method and efficiency method [28].

Platform selection
This paper choosed Sanya city as the research sample, it's one of the most popular city in China. As the largest online travel website in China, Trip has a total of 250 million users and 5 million daily active users in the community. Trip has collected 30 million real users' travel notes and hotel reviews. Therefore, this paper choosed Trip.com Group (www.trip. com) as the source website of travel notes of sample city.

The construction of preference complex network node
First of all, we taked the whole text set as the object to build a dictionary of all the nouns in the text set, that was, all the non repeated nouns in the text set are encoded. Secondly, calculated the word frequency of each noun in each dictionary. In the field of text topic mining, words with too high or too low word frequency did not have topic differentiation. Therefore, this paper eliminated the words with the highest and the lowest word frequency of 5% in the dictionary. Finally, we extractd the top 50% of words to build a new dictionary, which is the initial node of complex network. The dictionary that defines the construction of noun words after word frequency screening is Dic ， 3.2.2 The construction of preference complex network edge It mainly solves the construction " "of edge between nodes and the assignment algorithm of edge weight. The definition distance is: the number of other sentences between the sentences where two words are located. If two words appear in the same sentence, that is, cooccurrence, the distance between the two words is 1. By analogy, if there are n sentences between the sentences where the two words are located, then according to the definition of this paper, the distance between the two words is n + 1.
The text set of all Internet travel notes 1 2 ( , , , ) , the text of part i is composed of k sentences,namely, Where is the number of times that the nodes , appear in the first travel in the way of the distance t, is the weight of the node appears once in the form of the distance t, which is 1/t in this paper. That is, if the nodes , appear three times with the distance 3, the degree of association is the same as the nodes , .
After calculating the degree of association of each node , , the association degree of all the texts in the text set is summed, and the total correlation degree R of the nodes , is obtained, as shown by the algorithm.
Find the correlation matrix

Preference content mining based on Louvain community mining algorithm
After building a preference complex network, we need to further explore the community from this complex network. The community mined is the set of nodes in the complex network constructed above. These sets are various preferences. In this section, based on louvain community mining algorithm, we use complex network to mine the tourists's preference to get the preference content of tourists.Lougain algorithm is a community discovery algorithm based on the module degree. The main goal is to divide the community continuously so that the module degree of the whole network increases continuously. The algorithm uses module gain to calculate the ∆ change of complex network module degree caused by adding a new node to the community. The algorithm is shown as follows. Step 0 : think of each point as a separate community.
Step 1 : select a community as the starting point, divide adjacent points into the community and calculate the modularity at this time, judge whether the difference between the modularity before and after the division is a positive number, if it is a positive number, accept the division this time, if it is not a positive number, cancel the Division this time.
Step 2 :repeat the first and second steps, the modularity will not increase any more, that is to say, it will be optimal.
Step 3 :take each community mined in step 2 as a new point, build a new network, and continue to carry out step 1 and step 2 to get the community level. As shown in the Fig. 1 below.
Set the weight threshold value of the edge of the preferred complex network to prune the edge of the preferred complex network. If the weight of the edge is less than the threshold value, then prune the edge. If all the edges of a node are pruned, so that the point is isolated from the complex network, then prune the point.
After pruning, we use lougain community mining algorithm to mine the preference complex network.

Construction of Preference Content Classification System Based on Image Perception Elements of Tourism Destinations
By combing the relevant theories of tourist destination image perception [29,30], this paper constructs the preference content of tourists as shown in Fig 2. Among them, the preference content, the preference composition, the preference element correspond to the feelings in the tourism location perception theory, the preference community and the preference content correspond to the community mining result, and the preference sub-element firstly prefers the preference community element according to the preference element from top to bottom. After categorization, combined with domain knowledge and experience, the bottom-up summary of the preference elements is a link in the whole preference content category system. Whether the preference elements have sub-elements depends on the richness of preference elements.
Due to the complexity of the tourist's perception of the destination, the preference elements in a preferred community may belong to multiple preference sub-elements, and one preference sub-element may also contain multiple preference communities.

Model validation
In the field of data structure, in an arrangement, if the front and back positions of a logarithm are opposite to the size order, that is, the front number is larger than the back number, then they are called a reverse order, and the total number of all the reverse orders in an arrangement is called the reverse order number of this arrangement. For example, if the standard order is 12345, the reverse order number of 21345 is 1. We introduced the concept of reverse order into the accuracy test of preference model, and defined the accuracy, as shown in the algorithm.
In which, B is the reverse order number of scenic spots mined according to the preference model, and t is the total order number of scenic spots sorted according to the number of favorable comments.

Preference analysis based on structured data of network travel notes
In addition to text data, there were other structured data of tourists' travel in this paper. Through direct mathematical statistical analysis, we can get the tourism preferences of domestic tourists to Sanya. In this section, we mainly researched the following preferences of tourists based on the structured data of the crawled travel notes: travel time preference, partner preference, play preference, cancellation preference and preference and gender preference.
In the table 1, 17239 pieces of effective travel data were crawled. Each travel record included a trip of tourists. From this data table, we mainly analyzed four kinds of structured data: travel time, companionship, play methods and consumption. In the user information table, we crawled the user information of 13900 valid users, and we analyzed the gender data.

Preference analysis based on online travel notes
Pruning threshold settings for complex networks are not supported by theory and experience. Therefore, this paper used the experimental method to perform multiple simulation calculations and obtains the optimal pruning threshold of 20. As shown in Fig. 3 below.
In this paper, we used Python's community library to implement the Louvain community mining algorithm for the preference complex network, and construct the preference content classification system according to the classification method discussed above. The results of community mining are shown in the Fig. 4, in which different communities are represented by different gray levels.

Preference mining results based on online travel preference model
The results of preference mining are shown in the table 2, where the value of preference degree is accurate to two decimal places.

Model validation
In the captured scenic spot information data table, the number of favorable comments field can accurately reflect tourists' preferences for scenic spots. Therefore, this paper compared the preference sub elements (scenic spot name) contained in the scenic spot preference elements mined by the model with the preference information analyzed in the scenic spot information table to verify the accuracy of the model built in this paper. Because the information data table captured in this paper only contains the data of all scenic spots in Sanya City, only the scenic spots belonging to Sanya City in the text mining are compared.
We taked the ranking of scenic spots in the statistical table of scenic spots' favorable comments as the standard order, then according to the definition of the reverse order number, the reverse order number of scenic spots in the table of scenic spots' preference elements and preference degrees, the total order number is T, and the accuracy of preference mining model is According to the test results of the model, the accuracy of the proposed preference model is 85.89%, which can accurately mine the tourists' preferences.

Results and Discussion
Based on the vast amount of online travel notes accumulated on the Internet, this paper introduces the theory and technology of text mining into the study of tourists' preferences, which provide a new method for accurately mining tourists' preferences from the massive online travel notes. It has certain reference significance both in theory and in practice. First, based on the theory of complex network and the theory of tourism destination perception, this paper proposed distance algorithm and relevance algorithm, constructs the model of tourists' preference content mining, and realized the preference content mining of massive text data; second, based on the theory of emotion analysis, through the construction of emotion dictionary and degree adverb dictionary, the model of tourists' preference degree is established, and tourists' preference degree is accurately mined The degree of preference for various preferred content. (5) take Sanya City as an example to verify the model and fully verify the reliability of tourists' preference model.  This paper also has some limitations. (1) the classification of preference content in the preference model is not automated. In the follow-up research, machine learning domain theory can be introduced into the classification of preference content to build the automatic classification model of preference content and further improve the efficiency of tourists' preference mining. (2) affective computing uses the basic rulebased method. Although the accuracy of mining meets the requirements, the expansibility of the model is insufficient. If we want to mine the preferences of other city tourists, we need to rebuild the affective computing dictionary. The follow-up research can improve the emotion calculation method and increase the expansibility of this model. Number of communities