Simulation of China’s urban tourism activity based on improved density clustering algorithm

Tourism is the pillar industry of many cities, and it is also an important key point to promote urban development and maintain urban vitality. At present, the analysis of urban tourism activity in China can better assist the research of regional economic development and promote the orderly development of regional economy. Many scholars have carried out the analysis in this respect. As a new and growing field, artificial intelligence also plays an important role in urban tourism. With the continuous development of science and technology, and the human intelligence field of human research is also developing. New artificial intelligence products continue to emerge. The workload of most artificial intelligence may exceed the manual workload. In order to continuously update artificial intelligence, individuals effectively combine data mining and artificial intelligence, and combine many knowledge disseminated by the network with artificial intelligence technology to create an advanced knowledge network model. This paper uses the OPTICS-based clustering algorithm to analyze the clustering of photographs on the Flickr website and obtain information about tourism activities in Chinese cities. With the help of visualization software to visualize the experimental data and verify the experimental results introduced in this article, city tourism activities can be recommended to the destination. At present, many scholars have studied the application of improved density clustering algorithm in the field of biology and image analysis, but there are still some gaps in the development of tourism. This paper can make some contributions to the related fields.


Introduction
The clustering algorithm is the most famous spatial clustering algorithm based on noise (Liu et al. 2012). The spatial clustering algorithm can scatter the clusters of highdensity points represented by different branches and can screen out the low-density point regions and delete them. Finally, clusters of any shape in the noisy data set can be obtained (Duan et al. 2007). When using the DBSCAN algorithm, the parameters must be predefined, one is Eps and the other is MinPts. According to this, the distribution density of data points will be divided into units with different densities in order to capture the relevant points corresponding to the density and form the largest set. The DBSCAN algorithm is very sensitive to the above data. Whether the value is correct or not will greatly affect the accuracy of the clustering effect (Schubert et al. 2017). As the name suggests, artificial intelligence can use intelligence to represent human behavior and replace people to complete work (Secinaro et al. 2021). Data mining effectively extracts a large amount of data, retains useful information, and deletes useless information in time (Romero and Ventura 2013). Data mining is very useful for some operating systems. It can provide better solutions and improve the quality of work. This article will obtain experimental results based on the improved OPTICS algorithm, analyze the distribution density of network users and determine the level of user activity (Ali et al. 2020). Then, a comprehensive analysis of China's human habitat and tourism activities, visual images and spatiotemporal analysis results are carried out to determine the distribution and development trend of Internet users.

Related work
The literature shows that the era is advancing, the era of big data is slowly coming, and artificial intelligence is steadily becoming more mature (Allam and Dhunny 2019). The attempt to analyze and use big raw data and extract valuable information from it is the core of academic research. The literature shows that clustering technology occupies an important position in data mining (Kameshwaran and Malarvizhi 2014). Determining potential internal data structures from massive amounts of data is currently an important research challenge in the field of artificial intelligence. Artificial intelligence mainly relies on the computer technology network as the basis for connecting the network with actual work, thereby improving work efficiency. Artificial intelligence includes many systems (Figueiredo et al. 2019). For example, artificial intelligence commonly found in bank rooms can provide advice and help to people. These types of artificial intelligence are usually installed on systems that can imitate humans, such as high-level language systems and recognition systems, which can transform complex tasks into processes and fully express each program to help people. The literature proposed that artificial intelligence mainly relies on computer networks to connect the network to actual work, thereby improving work efficiency (Huang et al. 2020;Kumar and Thakur 2012). Artificial intelligence includes many systems. The literature shows that artificial intelligence usually found in bank halls can provide people with advice and help (Fethi and Pasiouras 2010). These types of artificial intelligence are usually installed on systems that can imitate humans, such as advanced language learning and recognition systems that transform complex tasks into programs. Data mining effectively extracts a large amount of data, retains useful information, and deletes useless information in time. The literature shows that data mining is very useful for some operating systems and can provide better solutions to improve the quality of work (Hassani et al. 2018). According to this, the distribution density of data points will be divided into units with different densities in order to capture the relevant points corresponding to the density and form the largest set. The DBSCAN algorithm is very sensitive to the above data. Whether the value is correct or not will greatly affect the accuracy of the clustering effect. The literature proposed a clustering algorithm based on OPTICS, which is used for clustering analysis of photographs on Flickr website and determining tourist activities in Chinese cities (Guo et al. 2019). With the aid of the experimental data visualization software and the review of the experimental results introduced in this article, urban tourism activities can be used in this recommended travel destination. The literature shows that cluster analysis, such as the organization of tourist routes, is widely used by most people across the country, such as image information processing, information collection and search, three-dimensional technology and so on (Higgins-Desbiolles 2018). Currently, most people or researchers in China use this algorithm in the field of analysis.
3 Research on improved density clustering algorithm

Similarity measure
The clustering algorithm uses the similarity between samples as the basis for grouping. Therefore, ensuring the rationality of the similarity measurement process is particularly important in the clustering algorithm, and even plays a decisive role in the performance of the clustering algorithm. Usually, in addition to the distance function, you can also use the similarity measurement function to express the similarity between objects. The ''distance'' here not only represents the Euclidean spatial distance, but also the difference between semantics and state, as well as the difference between time and density. In addition, when dealing with various problems and data, the matching elimination function should be determined according to the characteristics of the data itself. Find the data points, one of which is x i = (x l , x 2 ,…, x d ) and the other is x j = (x l , x 2 ,…, x d ). The content in this section is based on more general measures of similarity, so please expand the scope of the discussion.
(1) European spacing This function is often used in data mining and is roughly defined as: (2) Manhattan spacing The Manhattan distance, that is, the sum of the distance between two points on the coordinate axis, not only the L1distance, but also the city distance, defined as: (3) Chebyshev spacing The Chebyshev distance, that is, the maximum absolute value of the vector divided by the space vector minus the corresponding value is defined as: (4) Kowski distance The Kowski spacing is a set of spacing functions, defined as: In the above formula, the parameter category is that p is greater than 0. If p is equal to 1, this formula represents the Manhattan distance. If p is equal to 2, this formula represents the Euclidean distance. If p is infinite, this formula represents the Chebyshev distance.
(5) Mahalanobis spacing The Mahalanobis distance is mainly the data variance distance, and its formula is considered and relatively independent, defined as: S represents the covariance matrix. When S represents the identity matrix, the distance between two points is equal.
(6) Cos cosine similarity The Cos cosine similarity is the cos cosine value of the angle obtained by dividing the two plane vectors by the three-dimensional vector in space, and compares the difference between the two. The specific value is inversely proportional to this. When the two directions are the same, the higher the vector similarity is, it is defined as: (7) Pearson coefficient Pearson's coefficient is mainly used to show the close degree of correlation between variables. It mainly adopts the product-difference algorithm. In addition to the two variables, it also responds to the correlation degree of the two variables with the help of the product of deviations on the basis of the corresponding average value. The specific definition is: When determining the closeness between the variable X and the random variable Y, the Pearson coefficient needs to be used. When the absolute value of Pearson's coefficient is larger, it means that the two variables are closely related. If the decision variable X and the random variable Y are linearly related, when it is a positive linear correlation, it is equal to 1, and when it is a negative linear correlation, it is equal to -1.

Classification of clustering algorithms
Data have two main characteristics, one is complexity and the other is diversity. Nowadays, unity clustering algorithm cannot be used to solve many problems, because it is not mature enough. Scientists at home and abroad use special problems in various fields as a basis. This explores many clustering algorithms with better performance. When defining the clustering algorithm, the data size and the combination of structural characteristics and the characteristics and advantages of the algorithm itself determine the final selection method.

Section-based approach
Currently, the most popular clustering algorithm is based on partition. The basic idea is as follows: take the data set of n samples as the object, set the number of clusters k in advance, and further optimize the target split criterion to make the function failure value converge to the completion signal, thus ending the iteration and dividing the entire Record into k Cluster… The partition-based clustering algorithm mainly contains two main contents: k-mean and k-medoid. This article is about k-means. Iterative reallocation is the core of the k-mean algorithm. The purpose is to achieve the result of classifying k cluster C = {C 1 , C 2 ,…, C k } by minimizing the target stopping criterion corresponding to the square error and realizing the definition of the following criterion.
The k-means algorithm is recognized and used in many fields due to its two main advantages: one is higher operating efficiency, and the other is greater scalability. However, the algorithm also has the following shortcomings: First, it is strongly affected by the starting point, and it is even difficult to achieve global optimization. Second, it can only process clusters with a superspherical cluster structure, but cannot recognize arbitrary shapes. The ability of the class structure and secondly, the influence of outliers also limits the scope of the algorithm.
In view of the error, the representative point of this type in the algorithm iteration should use the entity of the real sample to avoid the outlier calculation from affecting the outlier. Then use the square, the minimum criterion error, to obtain the sample point of the last cluster using division when obtaining the center of the last cluster.

Method based on hierarchy
The hierarchical clustering algorithm is very different from the partitioned clustering algorithm in the basic idea. The former forms a cluster tree based on data points. In addition to re-decomposition, it also uses aggregation methods to calculate the final clustering results that can be obtained. There are two main types of hierarchical clustering algorithms. The first is the top-down agglomeration strategy. When all objects are clustered or meet specific completion requirements, all sampling points have been calculated and the clustering is over. The second is the split method. According to the strategy of treating equally and as a whole, all sampling points are regarded as a cluster. Although the threshold continues to split, if all sampling points meet specific planting criteria or form a cluster, the cluster will end at that point.
Determining the distance between clusters is a particularly difficult thing, we need to count a lot. Generally, we use the following methods to measure the spacing between clusters: 1. Nearest distance: The distance between the two closest data points is the inter-class distance.
In the formula, the two numbers are represented by C i and C j , respectively, where C i is x and C j is Y.
2. The farthest distance: the distance between the data points that are the farthest apart in the two numbers is the inter-class distance.
3. Centroid spacing: The spacing between the corresponding centroids in the two numbers is the inter-class spacing.
4. Average spacing: the average of the spacing between all sample points of two numbers is the inter-class spacing.
Every time we choose a different formula, we will get different results, so this is a very troublesome problem. When choosing a hierarchical clustering algorithm with the closest distance metric representing the distance between clusters, it is called a single link algorithm. If you choose a hierarchical clustering algorithm in which the metric with the largest distance represents the distance between clusters, then it is called a fully coupled algorithm at this point. The hierarchical clustering algorithm that represents the distance between classes is called the middle join algorithm at this stage.

Density-based method
Hierarchical clustering algorithms can only detect some specific areas such as convex clusters. In order to solve this problem and achieve the purpose of identifying clusters of arbitrary shapes, we have created a new algorithm. Its basic idea is that the target point clusters in the entire sampling space are composed of dense sampling points, and this type of sampling points are divided into low-density areas. The main purpose of this algorithm is to filter low-density areas and detect dense sampling points.

Model-based approach
The clustering algorithm of the model we created assumes that the distribution of the data follows a certain mathematical distribution, and the ultimate goal is to find the best area we need. Usually, the main content is the data set of the algorithm, and different probabilities are manifestations of different clusters. In addition to the EM algorithm, the most representative algorithm also includes the COBWEB algorithm. This article is about the EM algorithm.
There are three similarities between the EM algorithm and the segmentation method: the first is the ability to find spherical clusters of different sizes, the second is the efficiency of the EM algorithm, and the third is the time complexity and linearity. However, the EM algorithm assumes that the probability distributions of all attributes are independent, but it cannot be truly determined. The EM algorithm flow is shown in Table 1 below.

Grid-based method
Grid-based algorithm realizes clustering based on grid data structure. In a sense, it is very similar to the density clustering algorithm, but it is different from it. The difference lies in: it cannot find and execute clustering with resolution, is faster, and the processing time and the number of data points are independent of each other. And there is only one correlation. Related to the number of units in space. At present, the three main categories of grid clustering algorithms today mainly include the following parts: one is CLIQUE algorithm, the other is wave clustering algorithm, and the third is SING algorithm.
The wave clustering algorithm mainly has the following characteristics: First, its work efficiency is very high. On the other hand, approximate linear time is too complicated; second, he can explore spatial clusters of any shape on different spatial scales. Third, it is anti-noise. Fourth, it can process large amounts of data efficiently, and fifth, the sensitivity of the data input sequence is not high. Wave gathering still has shortcomings, such as not being able to adapt to uneven spatial data, not being able to cope with excessive density changes, and not being able to simultaneously achieve spatial proximity and similarity of subject attributes. The wavelet clustering (WaveCluster) algorithm flow is shown in Table 2.

Spatial clustering algorithm based on density with noise
The clustering standard generally uses density-based clustering methods to measure the spatial density of the data set. There is no need to know the number of clusters when calculating, so it will be more efficient and the answer can be obtained faster. The DBSCAN algorithm can not only identify clusters of all sizes and shapes, but also accurately determine outliers. Likewise, the order of input has no effect on the final answer. The above advantages make the application field more expanded. Definitions related to the DBSCAN algorithm include: Definition 1 The neighborhood of the data point P. Eps: Determine a point P and determine the radius of the circle.
Definition 2 The density of data points P: the number of data points in the query neighborhood Eps.
Definition 3 If the data point density P is called the base point, the MinPts threshold is user-defined.
Definition 4 Breakpoint: The data point q is not the base point, but near the eps of another base point P. At this point in time, q is not called a breakpoint.
In the DBSCAN algorithm, the data record D must be used as an object, from which any data point P is determined and checked to find the clustering relationship. If P belongs to the core of a point, it is the number of points in the neighborhood. MinPts, at this time, is the Eps neighborhood of point P. The midpoint of the data can be used as the starting point for the next process and should be checked again. Details: The starting point is used as the object to implement the continuous range request. If the starting point is the main point, the points in the eps nearby can be included in the starting point. Using this method, we can continue to expand and complete the class where the current data point is located after obtaining the complete The fifth step: The model parameter no longer changes or the parameter variable is lower than the given threshold

Density peak clustering algorithm
The density peak clustering algorithm has certain advantages, and efficient clustering of any shape can be calculated by it. The clustering algorithm generally finds the core first. The characteristics of the core are summarized in the following two points: First, the cluster center is denser, surrounded by neighbors with a density below it, and the surrounding data point density continues to decrease; second, compared to With a larger density of data points, the actual ''distance'' between the cluster center and its data point is further enlarged. The specific process of the density peak clustering algorithm method includes: first, establish the core based on the current characteristics of the cluster center; second, reduce the distance between two data points, and at the same time, the data point with higher density is now in the cluster. Looking at the data set, S represents the data set to be clustered, is represents the corresponding index set, and the distance between data points x and x j is represented by dij. The density peak algorithm introduces the distance s of the local density pi in the sample J with the smallest neighboring distance of the sample i and higher density at the same time.
For the above formula, the function is When the data points are continuous, calculate the local density and apply the exponential kernel calculation, the formula is The parameter d is the limit distance that must be entered manually. Combined with the density peak clustering algorithm, the limit distance d affects the local density of the sample. The algorithm clearly shows that as the number of samples in the data set increases, the marginal interval increases. Therefore, the sample size affects the marginal interval.
The calculation formula for the interval is as follows According to the above formula, a solution graph can be drawn. According to the attributes of the cluster centers, first find out the density and core. When we determine the core of the cluster, we will use qualitative analysis including subjective factors instead of quantitative analysis. For the same method, different people can get different answers, so the answer is not unique. For example, some people consider themselves the center of the cluster. Some people think these are the center of the cluster, as shown in Fig. 1.
In A, we should choose a larger one as the core of the cluster. That is, they are included in the sample category with higher density and closer to directly allocate the remaining sample J. After adopting one-stage allocation, the efficiency of the DPC algorithm has been significantly improved in terms of strategy. The solution of the distance between the samples versus the density of the samples is plotted. The algorithm achieves the effect of using a twodimensional plane to represent the cluster centers of any dimensional data set, thus completing the data analysis of each measurement cluster. However, if the cluster size is not large, the number of ''ideal'' clusters and the size of the data set are actually the same. Because the sample is sparse, it is difficult to obtain effective and clear density peaks. It is more difficult to find the density peak point using the above density estimation method. Here you can fully consider the value of A and its value, and obtain the following calculation formula: In the above formula, as the value of Y increases, the value of x will likely become the core of the cluster, so you only need to arrange the value of Y as the object in descending order, and intercept multiple data points from the beginning to cluster the end of the center. See Fig. 2. First judge the core points The second step:

Repeat
The third step: Select an unmarked core point, the direct density of all core points is the direct object, which is recorded as a cluster class The fourth step: Until the core points are all marked The fifth step:

Definition of outlier
In the above figure, the horizontal axis represents the index speed, and the vertical axis represents the Y value: when the Y value is at the center of a non-cluster, the overall situation is relatively stable, and when the Y value is at the non-cluster center, there will be a jump in the noncluster center.

Cluster validity evaluation index
Clustering is learned by ourselves, so we cannot judge whether the separation result is correct. Therefore, it is particularly important to establish evaluation indicators to achieve the rationality and efficiency of the clustering results. Currently, the most commonly used indicators include the following three categories: First, internal evaluation. It is not only related to the internal structure of the data set and the type of clustering, internal evaluation indicators also mainly measure the relationship between similarities. Thereby assessing the quality of cluster formation. The second is an external evaluation index. First determine their classification to emphasize the effectiveness of clustering. The third is the relative score index. The compactness and separability of clusters are the basis for the final determination of cluster results.
The clustering experiment in this document combines the above analysis with actual applications and uses standards to evaluate the results. One is the contour indicator and the other is the F-Measure indicator.
(1) Contour indicator For example, suppose that a two-point data set is a feature, and then divide it into k clusters C i . The average dissimilarity and average relative distance between the sample t from c and the rest of the class are denoted as T, while the average dissimilarity and average relative distance of all samples from other c classes are denoted as dt, so the formula is to calculate the profile index: (2) F-measure indicator F-Measure is an external index that combines algorithm extraction and precision. The comprehensive scorecard is especially used in the field of information retrieval. The accuracy of the actual clustering is defined as Pal, and the clustering of cluster C is defined as R, and P and R are used as recording objects. It basically has the following four types of relationships: (1) p and q belong to the same class in C, and accept the same division in P; (2) p and 9 belong to the same category in C, but do not belong to the same division in P; (3) p and q are not the same category in C, and belong to the same subdivision in P; (4) p and q are not the same class in C, and do not belong to the same subdivision in P.
The formula is Its F-measure indicator is expressed as Use weighting to calculate the average value, thereby obtaining the F-measure value of the result we want. Simulation of China's urban tourism activity based on improved density… 10039 Using N to represent the number of sample points, the accuracy of the algorithm is proportional to the F-measure.

Specific concepts
As the name suggests, artificial intelligence can represent human behavior through intelligence, but it cannot replace people. Artificial intelligence mainly relies on people to control through computer input programs, and its work efficiency is greater than that of humans. Different artificial intelligence may have different functions. Excellent artificial intelligence is usually installed with some systems that imitate humans, such as high-level language systems and recognition systems, which can transform complex tasks into processes and express each program as a whole to help people. Data mining involves effectively obtaining a large amount of information and can quickly find the information we need. Data mining is very useful for some operating systems. He can provide better solutions and improve work efficiency. Data mining is also very useful and very important in today's society. Data mining not only continues the advantages of artificial intelligence, but also increases the strength of the advantages. The final product of the combination of the two is very good, it allows us to further study the data. Therefore, data mining has an important impact on artificial intelligence.

Specific applications of data mining in artificial intelligence
Data mining can make it easier for people to select data. When selecting data, you can select the data layer first, which can shorten the search time. Second, the system can automatically delete vague or inaccurate information. Artificial intelligence is generally more systematic, and each connection is properly managed. The data from the entire network are monitored in real time, and each piece of information is processed and analyzed many times. Artificial intelligence can not only process data autonomously, but artificial intelligence can also learn and update independently. The illegal behavior library is used as source material along with the interpretation of laws and regulations and is based on feature extraction techniques such as DF, mutual information, expected cross-entropy and Bu Fang statistical information. Establish a functional model of illegal behavior. Behavior patterns can be the same as humans, that is, scientifically extract wood and create mathematical models that describe and replace behavior. Subjects and objects related to complex analysis are combined to form a complex database that is easy to find. Artificial experience database, information database and illegal behavior database, classification mining algorithm is used to create a database to learn human behavior. From a professional perspective, self-learning, in-depth understanding of natural language, pattern recognition, etc., learning different measurement algorithms, and gradually improving the intelligence and intelligence of judicial decisions by automatically adding and continuously optimizing existing data models and algorithms accuracy. The human brain cannot beat the high simulation performance and learnability of artificial intelligence. It can continuously study the designated target, constantly switch different angles in the system to simulate operations, and make flexible decisions in actual combat. Make sure you make a decision. Artificial intelligence is very powerful and requires technical experts to constantly update the artificial intelligence system to develop more functions to assist humans.

Conditions required to run artificial intelligence
Artificial intelligence must usually be controlled by a technician. Due to the wide range of methods, personnel must have technical knowledge and rich practical experience in order to issue commands to intelligent machines. If technical knowledge is not fully absorbed and teaching errors occur, this may lead to hidden dangers in the AI system, thereby shortening the life of AI. Artificial intelligence has many complex systems that make intersection work. Therefore, there is a lot of processed data. After long-term use of artificial intelligence, certain problems may arise, which may cause burns to humans. Therefore, in order to prevent the occurrence of such situations, we must regularly prepare for inspections and repairs to avoid system failures that may cause failures. Work and reduce work efficiency.
5 Analysis of China's urban tourism activity

Urban tourism activity analysis method based on weighted OPTICS
The number of tourists is based on the pros and cons of the city's location, information content and the level of tourist activity. In addition to providing location information, web application platforms usually need to choose what they like. We recommend something based on what humans need. Here, OPTICS has certain limitations, which is not what we need. As shown in Fig. 3, when we directly use the OPTICS algorithm to distinguish the density of a certain city 1 and another city 2, the city 1 is relatively dense, so it can be concluded that the tourist activity in city 1 is higher than that. From the figure It can be seen that if we assume that the total number of photographs in all locations in city 1 is equal to a, and the total number of photographs in all locations in city 2 is equal to h, if h [ a, the tourism activities in city 2 should be higher than that in city 1. Schematic diagram of the location of photographs in different cities as shown in Fig. 3.
In response to this defect, we modified OPTICS and used another analysis method higher than OPTICS. First, use the OPTICS algorithm to find the core spacing and the maximum spacing of the sampling points, and then calculate the average core spacing and the farthest spacing.
Because the distance between the chip and the chip is inversely proportional to the surface density. Therefore, we used the reciprocal in the improved algorithm, which means that the greater the reciprocal of the distance from the core, the denser the area. The number of photographs in a given area may better reflect the tourist activity in that area. Therefore, the value of urban tourism in this article depends on two aspects: the density of location points and the weight of the number of photographs in the area. Therefore, here we set the density weight of the location point to 0.8, and the weight of the number of photographs to 1.2.
Get the formula for calculating tourism activity of a location: The tourist activity formula for the location of the city:

Experimental results and analysis
Flickr is a photograph sharing website from Yahoo. An online service that can provide free and paid solutions to store and share digital photographs. It also provides an online service platform. Flick: A collection of digital images related to the relationship between users. Images can be linked together according to content. The image download program can define keywords for the photographs themselves. These are ''tags'' that allow users to quickly find the photographs they need. Describe the photograph or the subject of the photograph. The creator can quickly find out that other people use the same tag. Flick: Provides a formal API for researchers, available through Hops:hops//www.flickr.com/services/api/. Specific steps are as follows: Step 1. Obtain the API key and sign the API request.
Step 2. Read the community guidelines and API terminology used.
Step 3. Build and test. First, use the weighted OPTICS algorithm proposed in this article to analyze the latitude and longitude of the 3012 (a) Location and distribution of photos pf A city (b) Location and distribution of photos pf B city Fig. 3 Schematic diagram of the location of photographs in different cities regions defined by WOEID, and find the achievable farthest distance to each point. See the picture for details. Data graph of core distance as shown in Fig. 4, Data graph of reachable distance as shown in Fig. 5.
We are using OPTICS algorithm to process images in 3012 areas and determine the travel venue in each area. Data graph of regional activity as shown in Fig. 6.
It can be seen that each concave area can be regarded as a cluster, and the number of people in this area is relatively dense, for example: Tianjin, Shanghai, Haikou, Kunming and other places. This article divides regional events into five levels: The fifth level is an area with more than 50,000 tourists. Such as: Beijing, Shanghai, Tianjin, and Hong Kong.
The fourth level is the area where the number of tourist activities is greater than 30,000 and less than 50,000. Such as: Chongqing and Dalian.
The third level is an area where the number of tourist activities is greater than 10,000 and less than 30,000. Such as: Shaanxi, Chang'an and Xiwang.
The second level is an area where the number of tourist activities exceeds 7950 and less than 10,000. Such as: Jilin and Anhui.
The first level is the remaining area of the country after removing the fifth, fourth, third, and second level nationwide.

Visual analysis of tourism activity in Chinese cities
In order to obtain experimental results that make us look more intuitive, this article contains information on 3120 regions obtained for visual data display. When the data point is clicked, the computer will display data similar to ninety-nine percent: 1. Before the colon is the location code, and the last number is the total number of photographs in the location. As shown in the figure, the number represented in the lower left corner is the number of photographs. If the number of photographs is within this range, it will be displayed. If the number of photographs is not within this range, it will not be displayed. Flickr China Location Map as shown in Fig. 7. We use Flickr to conduct a comprehensive analysis of China's human environment and tourism activities, visual images, and time-space analysis results to infer the distribution and development trend of Internet users.

Conclusion
Clustering is based on the principle of ''things together'', using a specific set of data as objects, and obtaining related groups or clusters through separation, which leads to the creation of data with high similarity to each other, and general data objects are not the same. Data generally have many styles. Existing research results in the academic field usually raise specific problems in different fields, so appropriate high-quality clustering algorithms are provided, but such algorithms are not compatible. When defining a clustering algorithm, it is necessary to combine the data size and structural characteristics, and refer to the advantages and characteristics of the algorithm to ensure the rationality of the final method selected. When selecting data, first layer selection can greatly reduce the search time. Second, the system can automatically delete vague or inaccurate information. Artificial intelligence is generally more systematic, and each connection is properly managed. The data from the entire network is monitored in real time, and each piece of information is processed and analyzed 10 5 Fig. 6 Data graph of regional activity many times. OPTICS can determine tourism activities in Chinese cities, use visualization software to visualize experimental data and verify the experimental results introduced in this article. Urban tourism activities can be used to recommend tourist destinations, organize tourist routes, etc.
Funding The authors have not disclosed any funding.
Data availability Data will be made available on request.

Declarations
Conflict of interest The authors declare that they have no conflict of interests.
Ethical approval This article does not contain any studies with human participants performed by any of the authors.