Chinese Interest in Thyroid Related Diseases: Evidence from Baidu Index

Background: Common thyroid diseases are hyperthyroidism, hypothyroidism, thyroiditis, thyroid tumor and so on. Baidu is currently the most widely used online search tool in China, has developed an internet search trends collection and analysis tool called the Baidu Index. The aim of the present study was to understand the trend and characteristics of public’s online attention to thyroid diseases, and to explore the value of Baidu Index in monitoring online retrieval behavior of thyroid related information. Methods: Taking the period from January 1, 2011 to December 31, 2019 as the time range into consideration, we used the big data analysis tool of Baidu Index and took “thyroid nodules”, “thyroid cancer”, “thyroiditis” “hyperthyroidism” and “hypothyroidism” as the keywords, the data of “search index” and “media index” were recorded on a weekly basis, and all information were aggregated into quarterly and annual to generate the nal data which was carried out for secondary analysis. Pearson correlation analysis was used to analyze the correlation between the search index of keywords and the year. One-way Analysis of Variance was used to analyze the differences between search index and media index. Results: Among the ve keywords, thyroid nodule search index had the highest growth rate (640%), followed by thyroid cancer (298%). The media’s attention to thyroid diseases had been declining year by year. Unlike the public’s attention, the media index of hyperthyroidism was signicantly higher than other keywords. Conclusion: Over the past nine years, the public's attention to thyroid related diseases has been increasing gradually. Baidu Index is an effective tool to track the health information query behavior of Chinese internet users, which can provide a cost-effective supplement to traditional monitoring system.

The development of the internet has greatly changed people's lives, especially the expansion of search engines, which has further enhanced the value of the internet as a tool for life, learning and work. According to the 47th Statistical Report on Internet Development in China, there were approximately 989 million internet users in China by the end of December 2020, and the internet penetration rate reached 70.4% (8). It was estimated that the utilization rate of search engine among netizens was about 81.3%. 77.3% of users could nd the information they need through this service. Baidu search accounted for 90.9% of search engine users, ranking rst (9). Through the analysis of these online search trend data, it is possible to re ect the pattern of health information search behavior and interest of internet users on population level.
This study used the Baidu Index data platform to obtain data and conduct secondary analysis in order to understand the characteristics of public attention to TDs, information search behavior and the trend of media attention, to explored the value of internet search data in monitoring online information search behavior.

Data from Baidu Index
The data from Baidu Index (http://index.baidu.com/Helper/?tpl=helpandword=#pdesc) was used. Baidu Index is a big data sharing platform constructed by massive user behavior information, which shows the search trend of the selected keywords, gain insight into the changes in the needs of netizens, monitor the trend of media public opinion, and locate the characteristics of users. The platform can provide data such as search index, demand map, information index, media index and population attributes.
The data used in this study included: 1) search index: the data based on the search volume of netizens in Baidu, with keywords as statistical objects, scienti cally analyze and calculate the weighted search frequency of each keyword in Baidu web search. 2) media index: the number of news reported by major internet media related to keywords and included by Baidu News Channel. 3) annual netizen search rate: search index/annual number of netizens (the annual number of netizens comes from the Statistical Report on Internet Development in that year).
The keyword "thyroid" was searched through the demand map of Baidu Index platform, and the weekly keyword demand map was collected in December 2019. The keywords related to TDs with the highest demand were selected: "thyroid nodule", "thyroid cancer", "thyroiditis", "hyperthyroidism" and "hypothyroidism". The two nouns of non-thyroid related diseases: "what are the symptoms of thyroid" and "thyroid function" were excluded. The search index and media index for each keyword from January 1, 2011 to December 31, 2019 were obtained, a total of nine complete years. At the same time, due to the limitations of Baidu Index tools and the needs of research and analysis, this study recorded the data of search index and media index with weekly as the smallest unit, and summarized them to the quarter and year as the basis for subsequent data analysis.
We added up the ve keyword search indexes of each year to get the annual search index; the differences of annual search index, quarterly search index and annual media index of each keyword were analyzed by one-way ANOVA; the correlation between search index and year was analyzed by Pearson correlation analysis. After drawing the scatter plot and the regression line of the netizens' search rate in each year, the covariance analysis was conducted to test the statistical difference of the slope of the regression line among each group. P<0.05 (two-tailed) was considered statistically signi cant. Microsoft O ce Excel 365 (Microsoft, Redmond, WA, USA) and SPSS version 20.0 (SPSS, Inc., Chicago, IL, USA) were used to draw gures, and all statistics analyses were performed with SPSS.

Changes in search index
Over the past nine years, the sum of the annual search index of each keyword showed an upward trend and was positively correlated with the year (Pearson's correlation=0.983, P<0.001).The Figure 1 showed the changing trend of the annual search index. Each keyword was also positively correlated with the year (thyroid nodule: Pearson's correlation=0.981, P<0.001, thyroid cancer: Pearson's correlation=0.956, P<0.001, thyroiditis: Pearson's correlation=0.934, P<0.001, hyperthyroidism: Pearson's correlation=0.784, P=0.012; hypothyroidism: Pearson's correlation=0.954, P<0.001). In terms of search index growth, the absolute increase of thyroid nodule search index was the highest (4236537), followed by hyperthyroidism (1845562). The growth rate of thyroid nodule was the highest (640%), followed by thyroid cancer (298%). The changes of each search index over the nine years and their correlation with years were represented in Table 1. Using the least-signi cant difference method, we found that there was a statistical difference between the search index of thyroid nodule and thyroid cancer, thyroiditis and hypothyroidism (P<0.001), and between hyperthyroidism and thyroid cancer, thyroiditis and hypothyroidism (P<0.001). However, there was no statistical difference between thyroid nodule and hyperthyroidism (P=0.838). The search index of thyroid nodule surpassed that of hyperthyroidism for the rst time in April 2015 and was higher than that of hyperthyroidism for four consecutive years; the search index of thyroid nodule and hyperthyroidism was always higher than that of the other three keywords in nine years. The speci c results were shown in Table 2. As shown in Figure 2,in the past nine years, the annual search rate of netizens showed an upward trend, and the regression linear slope of the ve keywords was all greater than 0. The results of the covariance analysis showed that there was a statistical difference in the linear regression slope between different groups (F=16.876, P<0.001).
Unlike the keywords search index, the media index showed a downward trend in nine years (Pearson's correlation=-0.835, P=0.005). The Table 3 showed the changes in the media index for each keyword over the nine years. Among them, the media index of hyperthyroidism was statistically different from that of thyroid nodule (P=0.039), thyroiditis (P<0.001), and hypothyroidism (P=0.010). The relationship between other keywords were shown in Table 4.

Discussion
The results of this study showed that public attention to TDs had increased in the past nine years, but there were differences in different diseases. The attention of thyroid nodule and hyperthyroidism was signi cantly higher than that of hypothyroidism, thyroid cancer and thyroiditis, and the growth rate of thyroid nodule search index was more than twice that of the second place. Although all keywords showed an upward trend, the rising trend of thyroid nodule was more obvious than the other four keywords. This might be related to the increased in the prevalence of thyroid nodule in recent years (10). The incidence of thyroid nodules was insidious,and most patients were asymptomatic in the early stage, and patients were more likely to inquire relevant information on their own after detecting discomfort (11). As the largest search tool in China, Baidu's search results can re ect people's needs well. The disease prediction product jointly developed by Baidu and the Chinese Center for Disease Control and Prevention can provide real-time data on infectious diseases (12). At the same time, it can also be used to predict the epidemic trend of diseases, as a powerful complement to the traditional detection system (13). Baidu Index has not been used for TDs related research in China. Our study is the rst attempt to explore the behavior and interest of Chinese netizens in TDs, con rming the potential of using online search trend data to represent the real situation of TDs patients in China.
For the media index part, the results showed that the media's attention to the hyperthyroidism was higher than other keywords. This suggested that the media had pushed and reported more information about hyperthyroidism to the public in the past nine years. Overall, the media attention of TDs was on the decline. The reason might be related to the rapid development of the internet, the scattered news points, the shortage of media practitioners and the declined in the number of media concerned about TDs.
Despite the huge medical expenditure imposed on China by TDs (14), due to China's vast territory and large population, it is di cult to evaluate the true prevalence rate of TDs and to understand the characteristics and needs of TDs patients. With the wide application of the internet and the increasing reliance of the public on search engines as the main way to query health information, some online digital diseases surveillance tools has been explored in recent years (15)(16)(17). As a query tool, search engine can provide sensitive information on the disease before the diagnosis of the disease is reported, thus improving disease control. Internet big data has a broad application prospect in the medical eld, which may be a supplement and an expansion of the current clinical and epidemiological data. Today, with the rapid development of the internet services and search engines, combined with network data analysis can be regarded as an auxiliary means of traditional disease monitoring.

Limitations
This study also has several limitations. First, we only focused on the attention of Baidu search engine users to TDs, without considering the public attention on other search engines or social media, which can only re ect part of the public's attention to TDs. Second, there might be sampling biases in Baidu Index. Although the internet penetration rate in China had been greatly improved, the characteristics of internet users were obviously skewed to those with higher socioeconomic level and better educated segments. Third, Baidu Index algorithm has not been made public.

Conclusion
Between 2011 and 2019, the online search rate of TDs maintained a sustained growth while the media index showed a downward trend. The Baidu Index can be used to track Chinese netizens' online behavior and interest in TDs. This may help to improve our understanding of the incidence of disease, patient education and the use of online resources. Internet search trend data is a valuable source for monitoring the search behavior of TDs-related information. It can be used as an exploratory tool to better understand the characteristics and preferences of patients and provide a scienti c evidence for the control and prevention of TDs in China. Scatter plot of annual netizen search rate