Tourism is an important driving force for economic development. It is very necessary to improve the modern tourism system and promote the high-quality development of tourism. As a vital part of tourism management, the accurate prediction of tourist flow has practical significance for the efficient operation and destination management of tourism, such as scenic spots, hotels, and tourism companies.

On the one hand, using dynamic pricing strategies, tourism companies can formulate tour packages based on the forecast results to increase tourist flow during low-demand periods. On the other hand, by optimizing resource allocation and formulating emergency plans, those companies can avoid mismanagement caused by tourist congestion during peak periods. Because of the seasonal pattern and holiday effect, FB Prophet is used to predict the daily tourism demand of Jiuzhaigou Valley Scenic and Historic Interest Area and Macao, which shows its ability to deal with the seasonal pattern and holiday impact, and causes people to think deepl[1]. In order to alleviate traffic congestion and effectively predict tourism demand, fuzzy regression analysis using neural networks is used to generate data intervals, and the optimal non fuzzy performance values obtained from them are applied to the optimization of grey prediction models, providing reference for effective prediction of tourism traffic[2]. Therefore, the prediction of tourist flow is of great significance to the sustainable development of the tourism industry.

Modern Internet information technology has accelerated the high-quality development of tourism. To promote the innovation of tourism service and management, a deep integration model of “Internet plus tourism” gradually emerged. With the assistance of Internet technologies, more effective and time-sensitive information for tourism prediction can be provided, such as real-time monitoring data of tourist flow, tourism website search volume, social media data, and other network big data, which can be conducive to tourist flow prediction. In recent years, the employment of network data in tourist flow prediction has become a research focus in the field of tourism prediction. Some scholars have used the search index data and other network data to predict the tourist flow to achieve good prediction results[3–5]. A bidirectional long short-term memory (BiLSTM) combined with attention mechanism (ATT-BiLSTM) is used to extract data features from a group of prediction variables, including historical tourist volume, search engine data, weather data and rest day data, to predict the daily demand for tourism[6]. However, most existing studies only focus on the prediction of tourist flow based on search indexes or other single influencing factors. How to predict tourist flow with multi-source Internet big data is also worthy of further exploration. In fact, multi-source Internet big data based on search indexes and social media can reflect the attention, emotions, and attitudes of tourists, which can help improve tourist flow forecasting accuracy and thus tourism managers can make timely strategic decisions[7]. In this study, multi-source Internet big data, such as search indexes and online reviews on tourism websites, are applied in tourism prediction to verify the effectiveness of multi-source information.

Tourist flow generally has the characteristics of nonlinear and high volatility, and can be affected by multiple factors. For traditional tourist flow prediction, it usually adopts the linear regression and econometric models (e.g., autoregressive moving average model), failing to capture the complex characteristics of data. To overcome the limitations of traditional methods, some studies have used a combination method of decomposition and prediction. The original data are first decomposed into different time series, and then the suitable prediction method is selected according to data characteristics of mutability and seasonality[8]. Huang et al. proposed an improved deformation prediction algorithm based on the GM (1,1) model and empirical mode decomposition[9]. Compared with directly using the GM (1,1) model, it better describes landslide deformation prediction after decomposition. However, the decomposition based on single models is usually associated with incomplete decomposition. Therefore, multiple decomposition methods are adopted in this study to extract data features from different perspectives, which can not only make use of different models but also avoid the disadvantage of incomplete decomposition. In recent years, machine learning and deep learning prediction methods have been widely used in various fields with good performance[10–12]. For example, using deep learning methods to predict the daily traffic volume of rural roads significantly outperforms other traditional methods[13]. Therefore, for the decomposed series with different fluctuation frequencies, this paper chooses prediction methods (e.g., deep learning) according to the characteristics of the series to make a combined prediction.

Thus, this paper proposes a new method of multi-scale combined prediction of tourist flow driven by Internet big data. The contributions of this research are as follows: Firstly, compared with traditional studies, this paper considers the impact of multi-source Internet big data such as the Baidu Index and sentiment analysis of tourism website reviews on tourist flow, which provides richer information for tourist flow prediction to ensure its timeliness. Secondly, as for the tourist flow data with nonlinear fluctuations, this paper uses a hybrid decomposition method to fully extract the features of the data. Finally, this paper proposes a combined prediction method based on deep learning, which can improve the prediction accuracy.