Python Java Joint Implementation of Internet based Public Opinion Information Collection

Nowadays, with the rapid development of the Internet era, the network, as a new media form, has spread rapidly in daily news reports and public opinions, and has also penetrated into today's social life. Compared with real life, people's opinions on the Internet are more active. Major or popular events are often discussed quickly by network users, forming public opinion. Many people want to express their ideas and opinions through the Internet, and some of them will forward and follow up relevant information, but also cause huge public opinion pressure on relevant departments and the speech itself, even to unpredictable results. Therefore, this paper uses Python Java technology to establish a network public opinion information collection system. This system is mainly based on the current popular UML modeling tools, and uses this technology as much as possible in all subsequent descriptions by analyzing some cases and scenarios. The test results show that MySQL's read performance is about 30 times higher than that of Lucene based �les when persistence is required and the read performance is guaranteed, which further demonstrates the e�ciency of the system. This article has made remarkable achievements in the collection of public opinion and relevant information by constantly optimizing the Python Java technology on the Internet.


Introduction
The number of Internet users in China has reached the highest level in history at this stage.From the curve growth, it can be seen that the current total growth and growth rate are in a relatively stable state [1].In addition, due to the rapid popularization of smart phones, tablets and other smart terminal devices, as well as the increase in wireless network access by Internet operators and other factors, the number of wireless network users of many smart terminals, including mobile phone users, will increase exponentially in a short time, which has reached a worrying level [2].With the continuous development of the times, the number of Internet users is increasing day by day, which has brought great changes to people's daily life.People's opinions and ideas spread rapidly on the Internet, and the discussion of major events sometimes goes beyond the limits of the society, leading to unforeseeable events and ultimately irreversible results [3].It can be concluded from these events that the Internet has become an important place for netizens to spread various social ideas and safeguard their personal interests.These things often re ect online public opinion -online public opinion.With the constant attention and concern of the government and relevant departments, the Internet has become an important place for political decisionmaking and information dissemination [4].In recent years, in foreign countries, the use of Twitter to overthrow the Libyan domestic regime revolution has brought serious security implications to the British riots in London.In other countries, the examples of using social media to deal with collective events are very signi cant [5].Therefore, people further study emerging social networks based on integrated network theory and other theories from the perspective of national stability and social services, and analyze the public opinion analysis and emergency management methods of related theories [6][7].This method can effectively collect complex information on the network and control it in real time, It is of great signi cance to the stability of the country and society.

Related Work
From the perspective of public opinion software requirements, the literature has carried out a detailed survey of relevant software at home and abroad, designed the target software system of the paper, and studied the basic theory of complex networks, especially those closely related to social network structure analysis and information dissemination [8][9].The literature has implemented the analysis and mining algorithm of social network user relations, constructed an improved LPA community division algorithm based on side parameters, overcome the traditional shortcomings of LPA algorithm, constructed a node based opinion leader discovery algorithm, which can quickly nd the opinion leaders in large-scale social networks, and built an analysis software on this basis, which can conduct actual data operation and analysis [10].The standard software development model is adopted in the literature, which is used to analyze the requirements of the system described in this paper, and give the non functional requirements of the system [11].In addition, with the help of UML modeler, the functional modules and interfaces of the public opinion information collection system are designed in detail, and the system performance and display interface are designed [12].The literature uses Python crawler tools to obtain the internal data of the block chain based Steemit social network and the traditional Tumblr social network from the network.
The social network analysis method is used to compare and analyze Steemit and Tumblr in terms of point degree centrality, proximity centrality, eigenvector centrality, and intermediary centrality [13].
Literature uses a lightweight MySql database to store data.As the main users of the document system, administrators and ordinary users are responsible for managing and maintaining system affairs [14].
Ordinary users play two roles, both as content distributors who publish public opinion content to the system and as content receivers who vote on the system content [15].Literature is based on the incentive measures designed by literary game theory to limit user behavior.As a content publisher, it can only publish reliable public opinion content, while the receiver can only correctly evaluate the authenticity of public opinion content, and nally realize its identi cation [16].
3 Python Java Union And Public Opinion Information Collection Model

Basic overview of Python and Java
Python is an open source language.Its main features are simple teaching, powerful functions, and programming through the platform.It is known as "glue language" because of its ease of use and wide applicability.Therefore, it has been widely used in people's life.It is very explanatory and can directly execute and debug program code without compiling.For web frames and GUI frames, this technology provides many options.In addition, as an object-oriented language.It fully supports object-oriented properties, such as inheritance, derivation, and message passing, thus improving reusability, reliability, and scalability.It also supports dynamic typing and provides rich APIs and tools to extend system development using other programming languages.The Python integrated development environment IDE integrates the Python compiler into other programs that require a scripting language.

Joint realization and evaluation method
The mixed source software quality evaluation rst calculates the secondary index value in the MSQM model after information collection.First, extract the previously collected information from the database, replace it with the MSQM model secondary indicator calculation formula, calculate the results of each secondary indicator of the MSQM model, and store the secondary indicator calculation results in the database.
The following describes the calculation method of some secondary indicators in the model: The index importance judgment matrices A and A are constructed, and the structure is as follows.

4
represents the importance assignment of the jth index in the ith relative index.
Using the relevant concepts of AHP, it is rst necessary to calculate the eigenvector of the maximum eigenvalue  of matrix A ϖ Calculate and judge, ϖ The structure of is as follows: 5 Then, the eigenvector ϖ Perform normalization, that is, calculate: 6 First, calculate the consistency indicator CI: 7 Figure 1 shows the experimental results of the MSQM model in a pie chart.
Figure 2 calculates the average Kendall correlation coe cient of all experimental results.It can be seen that the average experimental value of MSQM model is 0.84, which is higher than 0.78 of ISO 25010 model and 0.76 of sub model.In general, compared with the traditional quality evaluation model, MSQM model is more suitable for the quality evaluation of mixed source software, better re ects the quality characteristics of mixed source software, and has better evaluation results for mixed source software.

Public opinion information collection model
Word is the smallest grammatical unit that can be used independently, and it is the basis for computer to understand natural language.The concept of "word" has always been a puzzling problem in the eld of Chinese linguistics.As for the concept of "word", there is no recognized authoritative vocabulary since ancient times.This is mainly because it is di cult to de ne single word and morpheme, word and phrase (phrase).Moreover, everyone's language ability and language division ability have certain differences.
Automatic word segmentation in strict sense is a problem without clear de nition.Therefore, the main task of Chinese automatic word segmentation is to determine the segmentation speci cation.
Model parameters include A, B and T, which can be obtained through statistical sampling.As for part of speech tagging, for known word strings, w = w1, w2, W3... wn, because the number of words is certain, the number of states of the known hidden Markov model is certain; Because the part of speech type corresponding to each word is de ned by the synonym dictionary, it is well known that the observed hidden Markov model symbols are de ned.Find the word sequence w and model parameters that can best explain the word sequence w = w1, w2, W3... the state sequence T = t1t2t3, which can be expressed by the formula: 8 Where, represents the model parameter.Because the goal is only to nd the maximum value, you can omit the elements in the formula that do not affect the nal comparison result.It is the same for all situations.P (w) means that for all possible p w labeling results, the probability of word strings is the same and can also be ignored.Conditional probability formula and Bayes formula: Including: 11 Among them, P (ti ti-1) and P (wi ti) can be counted according to training data.The model must include all possible state sequences in each given word string, so as to solve the e ciency problem due to low e ciency through Viterbi algorithm.
Clustering is the process of dividing a data set in a multidimensional space into multiple meaningful subgroups or classes, so that the similarity of samples within a class is as large as possible, and the similarity of samples of different classes is as small as possible.Text clustering is an important part of clustering.It classi es text into text data mining algorithms of different data types according to the characteristics of text data.This text can effectively classify and summarize information.The basic processing of text clustering is shown in Fig. 3.
As for the identi cation of hot spots, the most intuitive impression is the heat trend chart of hot topics, as shown in Fig. 4. The trend chart of time t1-t8 represents topic a. Through the heat trend graph of the theme, the status, trend and even future trend of the theme can be intuitively presented for a period of time.The horizontal axis of the heat trend chart represents the time, and the unit varies according to  speci c needs, such as seconds/minutes/hours/days/months.The vertical axis represents the topic weight value, which can be achieved through the weight calculation method.
The most important point in URL sorting is that it is determined by the relevance of the page and the dynamic theme.When crawling, the crawler will get a URL to enter the page.If the URL has a low correlation with the keyword, it will not be put into the crawl URL queue and will be discarded.In this paper, when the URL relevance exceeds the threshold, it enters the URL creep queue.If the queue is always less than 100, it will be sorted.If it is greater than 100, a new URL queue will be created.In the case of multiple queues, the queue headers will be scanned separately.Thus, priority calculation usually refers to the Harvest Rate.The formula is as follows: 12 The reverse document frequency (IDF) is mainly used to count the number of occurrences of related items in other documents in the global document set.If a characteristic term occurs frequently in other documents, it is likely to be an auxiliary word or supplement rather than a representative word.Information gain mainly refers to the information entropy difference between texts, and the statistics are: 13

Model improvement
Syntax algorithm: when the hash function is used to map the ngerprint to the table structure, the string will be converted to a number.Detect the similarity of documents according to the ratio or number of the same ngerprints in the nal statistical table.If the ngerprint set of le A is set to L (A), the ngerprint set of le B is set to L (B), and the similarity between le A and le B is set to X (A, B), many methods can be used to calculate the similarity.The calculation formula is as follows: The greater the cosine value, the higher the similarity.Two description options are given for this purpose: 16 The center of a point represents the number of interconnections between its nodes and other nodes in the network.These nodes are divided into incident centers and outgoing centers.In social network analysis, the central point is the baseline that can directly measure the central point.Obviously, the greater the value of the center point, the more important the position in the network is.The calculation method is as follows: 17 According to different node locations, the calculation method of the impact is as follows: 18 The centrality of feature vectors is an indicator to measure the in uence of nodes in social networks.PageRank Google is a variation of the center of feature vectors.The center value of function vectors indicates that the value of nodes depends not only on the number of neighbors, but also on the importance of neighbors.The calculation method is as follows:

System requirement analysis
Reasonable information collection space is the key to public opinion information mining.At present, public opinion information collection space can be summarized as follows: The theme of economic construction is considered as the center of public opinion.It spreads its basic ideology mainly through o cial organizations and administrative means, actively spreads positive energy information to people, guides people to the healthy and upward mainstream, reaches the consensus of the whole society as far as possible, and nally realizes the unity of thought and action.
Events with social signi cance are often the focus of public opinion.After the occurrence of a major event, it quickly spread to the Internet in a short time.The mainstream media and public news organizations fully reported the event in real time.A large amount of information appeared in the news media and network information reports, including people's reaction to the aftermath and follow-up related news events.
Because of its exibility, emergencies are often a prosperous point of network public opinion.The occurrence of emergencies usually goes through the transformation process from quantity to quality.When the epidemic occurred, due to its small scale, it did not attract enough attention.By spreading false information, when the rumor spreads to the crowd, it will cause public panic to a certain extent, and then cause the attention of the relevant departments.The public opinion information in this incubation period is very secret, which is mainly released through personal websites, forums and other o cial forms.It is di cult to be found at rst, and is generally spread in a hidden way.
The requirement analysis in this part is mainly based on the current popular UML modeling tool.All subsequent descriptions use UML icons to discuss and analyze the possible cases or scenarios described.
Internet search is mainly used by public opinion staff to search for hot and sensitive events on the network using keywords within a given period of time.The search results generate statistical information through web pages, news, blogs, forums, photos, videos, microblogs, etc.Its main working mechanism is the Internet search engine, through which relevant public opinion information can be obtained.

System architecture and function design
As a functional module for the interaction between database information system and data set in the background of the system, the data operation part is mainly used to achieve good data ow control and operation monitoring.The frame composition is shown in Fig. 5.
In the network public opinion information collection system studied in this paper, MVC architecture model design idea and database access mechanism design are the core, and the speci c data access process is shown in Fig. 6.
The system is divided into six major modules, as shown in Fig. 7:

System test results
At the beginning of the system design, the decisive factor for implementing inverted index in a group is based on two considerations: using Lucene to create an in memory index and creating an inverted index database table with a MySQL relational database.In order to compare the two design strategies, we designed a read and write experiment for the two options.The steps are as follows: (1) Clear all source indexes; (2) Under the same forwarding index, the keyword records n different indexes, records the total usage time, and calculates the number of record indexes with different implementation times; (3) Read all indexes of a keyword for 1000 times, record the total usage time, and calculate the number of indexes read in different time implementation units.The reason for 1000 consecutive reads is that the read time is relatively short, and errors can be reduced through multiple reads.
Repeat the above method for 5 times in order to calculate the average value.The experimental results are shown in Table The experimental results show that, compared with Lucene, which is updated every time using memory mode, this project uses MySQL for inverted indexing while maintaining the Lucene read speed of 84.2% (477.65/567.48).The main reason why MySQL storage performance is better than Lucene when necessary is that Lucene needs to re store a large number of indexes every time the index is updated, not just the changed index number.
Due to the strict requirements of the requirements analysis and functional design stage, the research system has achieved satisfactory results for functional testing.The system interface conforms to the user's usage habits.In terms of non functional expression, such as the color style of the interface, the test user also has good feedback.In other function and performance tests, the pressure test is normal, the reaction speed is good, and the functional indicators can also respond normally.The overall test results are satisfactory.Of course, the test results also exposed problems such as low server load and concurrent access during page response, resulting in slow operation and lack of security for the system.To solve these problems, system users need to improve their work by improving server performance, adding hardware con gurations with higher computing performance, and strengthening security settings.

Conclusion
With the rapid development of the Internet, social network users are increasing, economic globalization, information networking, increasingly complex society, and deteriorating natural environment have led to frequent mass emergencies in China in recent years.At the same time, with the rapid deepening of China's current reform and the rapid development of society, the public's adaptability to various new things on the network is still in the adaptation stage.This kind of emergency is easy to cause widespread impact on the network, which has a huge impact on the psychology and behavior of social groups.
Therefore, this paper adopts Python Java technology to build a network public opinion information collection system.According to the basic theoretical research results, combined with the international advanced research trends, the system has established a social network public opinion information emergency management and decision-making system serving the rapid economic development and social harmony and stability of China by using the existing research experiments and system environment. Figures

Functional completeness: 1 In
the formula, a represents the realization of the number of functions of the project, and b represents the number of functions required by the project.

19 4 Design
And Application Of Internet Public Opinion Information Collection System

Figure 1 Comparison
Figure 1

Figure 2 Comparison of experimental results of different models Figure 3
Figure 2

Figure 4 Topic
Figure 4

Figure 5 System
Figure 5