Designing a forecasting assistant of the Bitcoin price based on deep learning using market sentiment analysis and multiple feature extraction

Nowadays, the issue of fluctuations in the price of digital Bitcoin currency has a striking impact on the profit or loss of people, international relations, and trade. Accordingly, designing a model that can take into account the various significant factors for predicting the Bitcoin price with the highest accuracy is essential. Hence, the current paper uses market sentiment and multiple feature extraction to present several Bitcoin price prediction models based on convolutional neural network and long short-term memory. In the proposed models, several parameters, including Twitter data, news headlines, news content, Google Trends, Bitcoin-based stock, and finance, are employed based on deep learning to make a more accurate prediction. Such parameters are the input data used to predict the Bitcoin price. Besides, the proposed model analyzes the Valence Aware Dictionary and Sentiment Reasoner sentiments to examine the market’s latest news and cryptocurrencies. According to this study’s various inputs and analyses, several effective feature selection methods, including mutual information regression, linear regression, correlation-based, and a combination of the feature selection models, are exploited to predict the price of Bitcoin. Finally, a careful comparison is made between the proposed models in terms of some performance criteria like mean square error (MSE), root-mean-square error, mean absolute error, median absolute error, and coefficient of determination (R2). The obtained results indicate that the proposed hybrid model based on sentiments analysis and combined feature selection with MSE value of 0.001 and R2 value of 0.98 provides better estimations with more minor errors regarding Bitcoin price. This proposed model can also be employed as an individual assistant for more informed trading decisions associated with Bitcoin.


Introduction
The last decades have witnessed remarkable growth in the use of digital currencies by people and organizations.Nowadays, cryptocurrencies have received much attention and are being widely examined in the literature (Chaudhari and Crane 2020;Dai et al. 2021;Zuiderwijk et al. 2021;ElRahman and Alluhaidan 2021;Li et al. 2021;Mensi et al. 2019;Papadamou et al. 2021).In the modern world, cryptocurrency has been introduced as a novel and emerging topic governed by the blockchain's cryptographic protocol (Chohan 2017).Considering the concept of this cryptocurrency, people's thoughts about money have been revolutionized (Pant et al. 2018).Also, the value of the cryptocurrency has been significantly raised due to the continuous rise in adoption and widespread usage of that in the real world.According to the striking value of cryptocurrencies, some people consider them equal to real currencies or Fiat currencies.By comparison, others regard them as a good opportunity to invest.On January 9, 2017, the value of a Bitcoin increased from $863 by 2000% and reached its highest price level, i.e., $17,550, on December 11, 2017.Eight weeks later, on February 5, 2018, the price of Bitcoin became less than half of the price mentioned earlier, i.e., about $7900.Nevertheless, the promising technology behind cryptocurrencies, namely the Chinese blockchain, is about to raise the use of cryptocurrencies.Kristoufek stated that Bitcoin is a unique asset, and the price of Bitcoin cryptocurrencies acts like a standard financial asset (Kristoufek 2015).Bitcoin is regarded as the first decentralized digital currency in which transactions are conducted directly between the users and no intermediary (Naimy and Hayek 2018; Matta et al. 2015;Gao and Su 2020).This currency type fundamentally differs from what is typically employed in a prevalent monetary system.Based on mining, cryptocurrency is created, which has led to considerable variations in the online economic activities of users worldwide (Jain et al. 2018).Due to the fact that the price of cryptocurrencies does not behave as in the past, it is significantly challenging to predict the price of cryptocurrencies.Additionally, the large fluctuations in the cryptocurrency price, random effects in the market, and the influence of various factors on the price of Bitcoin have become a globally novel challenge.Hence, the issue of predicting the variations in the price of the cryptocurrency Bitcoin is of great importance.On the other hand, there are many opportunities for better understanding the drivers of the Bitcoin price (Karalevicius et al. 2018).
Moreover, since no central governing authority controls the digital currency and is mainly affected by the general public, Bitcoin is regarded as a volatile currency that changes based on socially constructed ideas.Therefore, the issue of sentiment analysis in the prediction of Bitcoin is of great importance, and many authors have studied it in this regard.The idea of some economists, such as Daniel Kahneman and Amos Tversky, has proved that the decisions made in this field are influenced by sentiments (Kai-Ineman and Tversky 1979).The study of R. J. Dolan regarding ''Emotions, Cognition, and Behavior'' further confirms that sentiments affect decision-making extremely (Dolan 2002).The sentiment analysis indicates that demand for a good product and price may be influenced more than its economic basics.In recent years, researchers have found explicitly that people make purchasing decisions and are under the effect of online data collection (Mittal et al. 2019).Galen Thomas Panger stated that Twitter sentiments are related to people's overall sentimental state.In addition, it was revealed that social media, such as Twitter, has a calming effect rather than reinforcing the user's sentimental state (Panger 2017).Based on a textual analysis conducted in a social context with the aim of investors called ''Search Alpha,'' Chen et al. stated that the comments outlined in the submitted articles of ''Seeking Alpha'' were highly effective and even could predict the astonishments of profitability (Chen et al. 2013).In a similar study, Tetlock demonstrated that high levels of media pessimism in the stock market directly affect trading volume (Tetlock 2007).Finally, in another study, Gartner pointed out that most users use social media to make their final decisions for purchasing (Pettey 2010).
Over time, extensive literature has developed on the effectiveness of tweet sentiments.Kouloumpis et al. showed that standard natural language processing methods, like sentence scoring, could have been more effective due to the short nature of tweets and the uniqueness of this writing style (Kouloumpis et al. 2011).Pak and Patrick divided the individual tweets into positive, negative, or neutral categories that could better understand sentiments by the computer (Pak and Paroubek 2010).O'Connor et al. indicated that the sentiments in tweets reflect public opinion on various topics in public opinion surveys (O'Connor et al. 2010).This study identified sentiment analysis as a more cost-effective option versus public opinion surveys.Nevertheless, according to this concept, the sentiments generated by tweets more accurately reflect the sentiments of most people on the topic.Hence, it can be considered for predicting demand and the results of product price variations.In another study, the researchers found that employment-related searches were related to the unemployment rate (Ettredge et al. 2005).A relationship between the volume of inquiries and the volume of stock trading on NASDAQ was observed in the study of Bordino et al. (2012).Choi and Varian have also conducted specific studies on Google Trends and presented remarkable results (Choi and Varian 2012).According to the results of this study, it can be concluded that simple seasonal models of trend data are considered input data that outperform models that did not use Google Trends.Also, Asur et al. found that the extent to how much a keyword is a trend in newly released films accurately predicted their revenue at the box office (Asur and Huberman 2010).
Overall, sentiment data can be used to predict variation in macroeconomic statistics, and many studies have been performed in this field.Several researchers, including Choi and Varian (2012) and Ettredge et al. (2005), have claimed that web-based search data, which is the same as Google Trends data, can be particularly utilized to predict the price of Bitcoin.Dennis and Yuan collected capacity scores in tweets associated with 500 S&P companies and realized a correlation between them and stock prices (Sul et al. 2014).De Jong et al. analyzed minute-by-minute stock prices and tweet data of 30 stocks at the Dow Jones industrial average (Jong et al. 2017).Accordingly, it was revealed that 87% of stock returns were under the effect of such tweets; however, the authors also sought the vice versa in that stock.As a result, the prices affected tweets.Bollen et al. used a selforganizing fuzzy neural network to predict price changes in the DOW Jones Industrial Average and obtained 86.7% accuracy by Twitter sentiments (Bollen et al. 2011).Evita Stenqvist and Jacob Lo ¨nno ¨presented a study, ''Predicting Bitcoin price fluctuation by analyzing Twitter sentiments,'' and obtained striking results (Stenqvist and Lo ¨nno ¨2017).The authors collected and processed the tweets regarding Bitcoin and Bitcoin prices from May 11 to June 11.Then, the unrelated or unaffected tweets were eliminated from the analysis.After that, the authors used the VADER (Valence et al.) method to analyze the tweets' text.Besides, the authors categorized the sentiments of each tweet and labeled them negatively, neutrally, or positively.Connor et al. employed the sentiment of news headlines and tweets to predict price changes in Bitcoin, Light Coin, and Atrium (Lamon et al. 2017).The results of this study represent the remarkable performance of the logistic regression for classifying these tweets.The authors also accurately predicted the 43.9% price increase and 61.9% price reduction.Colianni et al. collected tweets from November 15, 2015, to December 3, 2015, and used Naive Bayes and support vector machines to classify tweets and reach higher accuracy for predicting price (Colianni et al. 2015).Finally, Shah et al. successfully presented a strategy using historical prices and Bayesian regression analysis (Shah and Zhang 2014).
Traditional time series prediction techniques like Holt-Winters exponential smoothing models are fundamentally related to linear assumptions and need data to break down into trend, seasonal, and noise to be effective (Chatfield and Yar 1988).The traditional methods could be more useful since the Bitcoin market mainly lacks seasonality and high volatility.To tackle this drawback, deep learning (DL) technology has been introduced as a novel technique that reduces the costs and complexity of the calculations (McNally et al. 2018).Unlike the traditional linear statistical models, the artificial intelligence (AI) method is able to consider the nonlinear property.Notably, artificial neural networks (ANNs) with deep learning (DL) algorithms are regarded as the most thriving methods due to their remarkable predictive capabilities (Nakano et al. 2018).In the cutting-edge paper of 2017, A. Radityo et al. employed ANN to forecast the next-day price of Bitcoin (Radityo et al. 2017).Four types of ANN algorithms have been considered in this study, namely neuro-evolution of augmenting topologies (NEAT), genetic algorithm neural network (GANN), genetic algorithm backpropagation neural network (GABPNN), and backpropagation neural network (BPNN).Considering machine learning algorithms such as generalized linear models and random forests, Bitcoin price prediction was modeled by Madan et al. binomial classification problem (Madan et al. 2015).In 2008, Zhu et al. used the volume of the stock transactions as a neural network input to improve the forecasting performance in the medium and long term and presented acceptable results (Zhu et al. 2008).A modular neural network was employed by Kimoto et al. to predict the best shopping point (Kimoto et al. 1990).Guresen et al. compared the performance of different neural networks in stock market prediction and proved that a multilayer perceptron (MLP) neural network outperforms the others (Guresen et al. 2011).In contrast, S. McNally stated that the capabilities of the recurrent neural network (RNN) and the long short-term memory (LSTM) outweigh the benefits of MLP due to the temporal nature of Bitcoin data (McNally et al. 2018). Similarly, in 2019, Tandon et al. attempted to present the price prediction model to forecast the Bitcoin price using RNN and LSTM with tenfold crossvalidation.A careful comparison was made between the proposed model and other available models, including RNN with LSTM, linear regression, and random forest.The benefits of the proposed model were proved, and remarkable results were presented.In a major advance of 2020, Dutta et al. used a gated recurrent unit method to forecast the Bitcoin price and obtained acceptable results (Dutta et al. 2020).In 2021, Ramadhan et al. also used (LSTM-RNN) to predict the Bitcoin price (Ramadhan et al. 2021).Das et al. also presented a hybrid Bitcoin price prediction method based on ANN and using Bi-LSTM and Bi-RNN in 2021, and the benefits of the proposed method were revealed (Das et al. 2021).Despite this interest, as far as we know, we have yet to study the issue of Bitcoin price prediction considering Twitter data, news headlines, news content, Google Trends, Bitcoin-based stock, and finance using CNN and LSTM.
Considering CNN and LSTM, the current paper proposes a model for forecasting the variations in the price of the cryptocurrency Bitcoin.For this purpose, a variety of methods of textual sentiment analysis, such as news headlines, news, and tweets, are considered.Such methods consist of using the Twitter API, a Python library, namely ''Tweepy,'' extracting text and news content from the Telegram channel, the reference site regarding cryptocurrencies, namely Kevin Telegraph, and receiving and extracting Google Trends data.In the beginning, the data are collected from the storage using the tweets in which Bitcoin is mentioned.Then, the tweets are analyzed to calculate the sentiment score and compare it to other days.After that, that day's price is examined to determine if there is a relationship between tweets and variations.As a result, variations in the price of cryptocurrencies can be determined using sentiment.A careful comparison is made between the proposed models in terms of some performance criteria like mean square error (MSE), root-meansquare error (RMSE), mean absolute error (MAE), median absolute error (MedAE), and coefficient of determination (R 2 ).The major contributions of this paper are summarized as follows: The remainder of this paper is organized as follows: The major fundamental concepts regarding the topic of the present study, including an overview of cryptocurrencies, Twitter, sentiment analysis, and Google Trends, are explained in the second section.The issue of data collection is also illustrated in the third section.The research method and how to select model inputs are examined in the fourth section.Finally, a summary of the present study and the main conclusions, as well as suggestions for future studies, are presented in the fifth section.

Preliminaries
The analysis presented in this paper needs an understanding of why and how cryptocurrencies are different from valid currencies or stocks in the companies of the traditional stock market.This section provides more information regarding such reasons and clarifies why these cryptocurrencies are used.Since cryptocurrencies are part of the more extensive technology (China Blockchain), Twitter activities can be very effective.It should be noted that Google Trends data and the volume of tweets represent an overall tendency to have cryptocurrencies.Hence, the basic concepts concerning cryptocurrencies, Twitter, sentiment analysis, and Google Trends are described here.

Blockchain and cryptocurrencies
The data of the first cryptocurrency in the world are analyzed in this paper.Bitcoin is the largest cryptocurrency in terms of market size, followed by Atrium.Bitcoin was the first cryptocurrency to be created.Creating Bitcoin is mysterious since it was created by a person or group of people using the name ''Satoshi Nakamoto'' in 2009.At the same time as launching Bitcoin, Satoshi Nakamoto presented a paper entitled ''Bitcoin: A peer-to-peer Payment Method'' (Nakamoto and Bitcoin 2008).In contrast to cash, this system outlines a peer-to-peer payment method using an electronic system.The cryptocurrency can be sent directly from one party to another without the use of a third party to verify the transaction between them.This innovation is presented by employing the ''blockchain,'' which is like a common ledger in the whole transaction.This is a peer-to-peer network that verifies the whole transaction to prevent forging them.
Since the applications of blockchain technology go far beyond peer-to-peer payments, this technology provides security, privacy, and decentralization.A decentralized office exploits the blockchain for IoT applications, isolated storage systems, healthcare, and more (Xu and Croft 1998).The range of blockchain applications has led to the creation of more blockchains and cryptocurrencies.Furthermore, using the blockchain increases the usage of cryptocurrencies and gives them intrinsic value whose amount depends on many factors.The main reason is this it is a new technological debate.Notably, the information regarding the type of currency and how it stores its new value is useful to improve understanding of what can lead to price changes.

Twitter
Twitter was created in July 2006 as an application that consists of other applications, websites (such as Instagram, Facebook, and LinkedIn), and microblogging.A microblog is a medium that allows smaller and more frequent updates compared to blogging to be performed.Twitter allows users to send messages publicly (called ''tweets'') up to 140 characters long, which was doubled on November 6, 2017, to 280 characters per tweet.Users can add a ''hashtag'' to the tweet, denoted by the symbol of ''#.''This symbol follows a sequence of characters employed to identify the subject of a tweet and search for that.Hashtags are considered later when collecting tweets in the data section.The role of Twitter data is significant for specifying people's opinions about a specific product and service.This benefit helps financial managers detect angry customers and their opposing points of view.Regarding Bitcoin price, it is considered a social media channel for crypto projects which is also helpful in directing traffic to the website and enhancing brand awareness.
Notably, Twitter has received much popularity rapidly since its launch in 2006.Evidence that shows how much Twitter is essential dates back to January 15, 2009, when an Airways plane crashed in the Hudson River in the USA.An image that was posted on Twitter regarding that incident broke the record for the views' number.Because 83% of the world's leaders have Twitter accounts, Twitter earns nearly $ 330 million a month with 1.3 billion users.Due to such considerable statistics, the Twitter database can be significantly rich and efficient.It is considered a great source of information showing how people almost feel about anything you want.Also, you can observe how these feelings change over time since it can inform you when a tweet has been sent.Hence, Twitter is regarded as a remarkable resource for collecting textual data on a topic such as cryptocurrencies to explore possible relationships between them and their prices.

Sentiment analysis
90% of the global data has been generated in the last 2 years.Most of these data are in the form of textual data without structure.These data can also be in the form of tweets, articles posted on the Internet, text messages, emails, or others that create such a wide amount of unstructured data.''Natural language processing'' (NLP) is considered a novel discussion that is being studied or developed.There is a set of methods for computers to analyze and understand the text.In this paper, a set of natural language processing tools called ''emotion analysis'' is employed.Sentiment analysis is conducted to extract and measure the sentiments or mental opinions outlined in the text.There are several methods to do this, but the ''VADER'' (Valence Aware) method is selected in this study (Manning and Schutze 1999).
The aim here is to use sentiment analysis in the collected tweets to determine what tweets have positive or negative comments regarding cryptocurrencies.Notably, there is a fundamental difference between emotion and sentiment here since the emotion contains raw data, but the sentiment is structured and organized.Overall, emotions are faster, more intense, and more reactionary than customer sentiment, which takes much time to think about.The polarity of a text, including positive, negative, and neutral, is considered for sentiment analysis.Nevertheless, feelings and emotions like happiness, anger, and sadness are sometimes examined for sentiment analysis.

Google trends
In many parts of the world, almost the whole aspect of daily life includes the Internet.Browsing the Internet is conducted through search engines and Google.Nowadays, the most popular search engine in the world, with 74.52% of searches in Google.Therefore, Google search data can provide credible insights into the world's interest and the extent of this interest in anything.Google makes these data available through Google Trends.The data provide information concerning the popularity of the searched words compared to other words.There is a variation in the ranking of Google Trends data at different times in cryptocurrencies, which can be related to increases and reductions in the public profit and the price of cryptocurrencies.

Headline and the main text of the day's news
Due to the fact that the price of cryptocurrencies significantly depends on positive and negative news and the cryptocurrency market follows more fundamental analysis, we decided to extract news from the most globally reputable news site in the field of cryptocurrency, i.e., Kevin Telegraph, for increasing the accuracy.From ''2021/02/ 05'' to ''2021/09/10,'' the extraction and analysis of sentiments based on Twitter data have also been conducted based on the news to see how the news is effective for determining the Bitcoin price.

Methodology
The main information regarding the proposed method of this study to predict the Bitcoin price is given in this section.Also, the main method and neural networks that are used to reach the final results and predict the Bitcoin price accurately are outlined in this section.

The proposed model
This section uses market sentiment and multiple feature extraction to analyze a price forecasting assistant or a predictive model based on CNN and long short-term memory (LSTM).The proposed model consists of different parts, and each part has information and details, which are described separately in the following section.Besides, the flowchart of the proposed method is shown in Fig. 1 for better understanding.Additionally, VADER sentiment analysis is exploited in the proposed models to examine the latest market news on cryptocurrencies.In the proposed models, the Twitter data analysis, the news headlines, news content, Google Trends, Bitcoin stocks, and financials based on deep learning are employed to forecast the Bitcoin price better and more accurately.Moreover, due to the high extraction feature of different input data, the selection methods of mutual information regression, linear regression, and correlation-based selection methods are exploited.A combination of three feature selection methods is considered in a separate model to benefit from their advantages.According to this section's input data, nine different models are developed based on CNN and LSTM to forecast Bitcoin prices.In each of these proposed models, different layers and separated input data are considered to examine the effect of each input data on the Bitcoin price prediction.Finally, the various proposed models are compared with each other in terms of criteria such as MSE, RMSE, MAE, MedAE, and R 2 .
According to the presented flowchart shown in Fig. 1, this section consists of several main sub-sections, including data collection and data set, text preprocessing and text feature extraction, data normalization, VADER-based sentiment analysis, feature selection, proposed models based on deep learning, evaluating the performance criteria.In the following, these criteria and the main points and methods are illustrated in each subsection.In the following, nine models are introduced for comparing the outputs.The data set, feature selection methods, and learning techniques are various in the proposed models.In the hybrid strategy of combining feature selection, the three methods, including correlation, linear regression, and mutual information, conduct the feature selection process.Then, tweets encounter three feature selection steps, and the outcomes are inserted in the deep learning method based on CNN and LSTM.Since the accurate prediction of the Bitcoin price is crucial in financial management, a special effort was made to raise the accuracy of the feature selection as much as possible.The limitations of the three methods are overcome in the hybrid strategies as they simultaneously cover each other.

Deep neural networks
A deep neural network (DNN), an artificial neural network (ANN) with multiple layers between the input and output layers, contains various types of neural networks, but the same components: neurons, synapses, weights, biases, and functions always exist in this network.DNNs have been widely used in related work due to their remarkable application (Surendar 2021;Soni et al. 2021).In short, the  DNN (Awoke et al. 2021;Liu et al. 2021) strongly suggests that this technology is highly beneficial to developing the Bitcoin price prediction models.

LSTM networks
LSTM networks, abbreviated as ''Long Short Term Memory,'' are a special type of recurrent neural network that can learn long-term dependencies.Hochreiter and Schmid Huber proposed these networks in 1997 for the first time.Notably, many researchers were involved in improving these networks mentioned in the original text.
In fact, the major aim of designing LSTM networks was to deal with the problem of long-term dependency.It is noteworthy that memorizing information for long periods is the default and normal behavior of LSTM networks, and their structure is such that they can learn very distant information well, which is a striking characteristic of their structure.
The whole recurrent neural networks are in the form of iterative sequences (chains) of modules (units) of neural networks.In standard recurrent neural networks, these iterative modules have a simple structure.For instance, it has only one layer of hyperbolic tangent (tanh).Iterative modules have only one layer in standard recurrent neural networks.
LSTM networks have a similar sequence or chain structure, but the iterative module has various structures.They contain four layers rather than one layer of neural networks that interact with each other according to a special structure.In LSTMs, iterative modules have four layers that interact with each other.

CNN neural networks
The convolutional neural network is similar to other neural networks (e.g., the MLP neural network) and is composed of neural layers with bias and weights and the ability to learn.The following items occur in each neuron: • The neuron receives a set of inputs.
• Internal multiplication is conducted between the weights of the neurons and the inputs.• The result is added to bias.
• Finally, a nonlinear function (the same as the activation function) is passed.
The above process is conducted layer by layer and reaches the output layer, creating the network forecast.

Feature selection methods
Feature selection is known as the process of specifying the least possible number of features in a data set that can describe this set and the main features (Alweshah et al. 2021).A feature selection aims to eliminate unnecessary features and select a vital feature according to the data set and its class (S ¸ahin et al. 2021).Feature selection has been demonstrated to be an effective and efficient data preprocessing technique for preparing data (especially high-dimensional data) for a variety of data mining and machine learning issues.Building easier-to-understand models, enhancing data mining performance, and creating clean, comprehensible data are all goals of feature selection (Li et al. 2017).In order to decrease the dimensionality of the data and improve the performance of an algorithm, like a classification algorithm, feature selection is a crucial task in data mining and machine learning (Xue et al. 2016).However, because of the wide search space, feature selection is difficult.Numerous approaches have been used to address feature selection issues, while evolutionary computation (EC) methods have recently attracted a lot of interest and demonstrated some promise.
It is NP-hard to find the ideal subset of features (Dokeroglu et al. 2022).One of the best techniques for tackling combinatorial issues is using metaheuristic algorithms.Furthermore, research demonstrates that metaheuristic algorithms outperform exhaustive or greedy methods.Modern metaheuristic algorithms are heavily influenced by nature, and they are frequently employed in the field of feature selection today.The proposed model exploits three different selection models: mutual information regression feature, linear regression, and correlationbased selection.The proposed deep model is considered for the feature selection methods.The combination of these three feature models in Sects.4-6 is designed by Model-1, Model-2, and Model-3 models.Finally, the results that prove the superiority of these feature selection methods are highlighted in Sect.6.

Linear regression
Linear regression is one of the simplest and most popular machine learning algorithms.This model is a statistical method and is used for predictive analysis.Linear regression is used for continuous, real and numerical variables such as sales, salaries, age of people, product prices, and other things.Linear regression algorithms aim to find the best-fitting line and optimal values for separators and coefficients, including minimizing errors.The error in linear regression algorithms is the difference between the actual and predicted values, and the algorithm aims to reduce this difference.The main reason for using linear regression is that it can appropriately test feature selection methods, especially when the irrelevant features are eliminated (Tog ˘ac ¸ar et al. 2020).The other feature selections, such as Boruta, existed in the related works, but those need to have the accuracy of the ones selected here.For example, Boruta adopts a simple procedure for feature selection which is not suitable for Tweeter data.Based on this technique of Boruta, a feature is only useful when it has a relatively high feature importance score.On the other hand, when it has a relatively low feature importance score, it is not suitable (Heidari et al. 2022).

Correlation-based feature selection method
In this feature selection method, a subset of features is called a good subset whose features, on the one hand, have a high correlation with the ''classification'' or target feature and, on the other hand, are uncorrelated with each other.The extent of ''merit'' or goodness of a subset of features is calculated by the following equation: The proposed method is considered the correlationbased model since it can accurately identify the correlation between Bitcoin value and features.Therefore, the correlation-based model is useful for identifying an important feature based on the correlation between the feature and the value of the class or Bitcoin.Here, we need to select the features prior to or during model training to make better predictions regarding the Bitcoin price, and the learned model is not useful after training.Therefore, feature selection is preferred to feature importance in the study (Saarela and Jauhiainen 2021).Compared to other feature selections, correlation-based feature selection can better eliminate irrelevant and redundant features with remarkable accuracy and less complexity (Duangsoithong and Windeatt 2010).Correlation can also predict one feature from another for imputing missing values and demonstrating the existence of a causal relationship.The method properly specifies the prevalence and relationship of the variables for predicting events based on the present data and knowledge (Curtis et al. 2016).

Feature selection with mutual information
The features provide a lot of information from the output to the model, and the model estimates the amount of output in classification and regression projects based on this information.The mutual information method has a completely different approach to the previous methods and examines the relationship between a feature and the output instead of analyzing the mean and variance.Also, based on the amount of mutual information that a feature gives the output, it is scored.The approach of this method is significantly interesting and important, and it can accurately determine how much a feature is appropriate for estimating the output.Mutual information shows a vital criterion of interdependence between features that are widely used in feature selection (Vergara and Este ´vez 2014).The mutual information feature selection method is one of the effective feature selection methods that is used in the proposed model (Kraskov et al. 2004).Mutual information is different from correlation as correlation measures linear dependence while mutual information measures general dependence.Regarding the sentiment analysis of the tweets, we need to compare the variables to predict Bitcoin price better.As reported in the literature, mutual information has many applications in tweet classification and feature selection (Utama 2019;Shamoi et al. 2022;Ahanin and Ismail 2022).

Statistical analysis
This section considers the various criteria for examining the proposed method based on multiple feature extraction and deep learning.The majority of authors typically employ MSE errors to make a comparison between the different models.In this paper, several main prediction criteria such as mean square error (MSE), root-meansquare error (RMSE), mean absolute error (MAE), median absolute error (MedAE), and R 2 are considered.The following presents the formulas of these criteria and their explanations (Bui et al. 2018;Chou and Bui 2014;Chou et al. 2016).Also, Table 1 summarizes the evaluation criteria of the proposed method.
Mean square error (MSE) This criterion calculates the mean square error of the distance between the predicted values of the proposed and actual Bitcoin method.The smaller the MSE values, the more accurate the Bitcoin prediction result of the proposed method.Through Eq. ( 2), this criterion is calculated.
where n denotes the number of samples, y i is the experimental or actual Bitcoin values, and ŷi represents the predicted Bitcoin values of the proposed method.
Root-mean-square error (RMSE) If the square root of MSE is calculated, this criterion is called RMSE.In fact, the comparison between MSE and MAE is not correct due to the variation in the scale of the error value in MSE.Hence, it is necessary to define the RMSE criterion.This criterion is represented in Eq. (3).
where n shows the number of samples, the experimental or actual Bitcoin values are captured by y i , and ŷi indicates the predicted values of the proposed method.Notably, if the variance in individual errors is greater, the gap between the MAE and RMSE criteria becomes larger.
Mean absolute error (MAE) This criterion calculates the mean absolute difference between the predicted values of the proposed and actual Bitcoin method.The smaller the MAE values, the more accurate the prediction result of the proposed method.Through Eq. ( 4), this criterion is calculated.

MAE
According to Eq. ( 4), n shows the number of samples, y i is the experimental or actual values of Bitcoin, and ŷi is the predicted values of the proposed method.
Median absolute error (MedAE) This median criterion is considered to calculate the absolute difference between the predicted values of the proposed and actual Bitcoin method.This criterion is shown in Eq. ( 5).MedAE ¼ medianð y 1 À ŷ2 j j; . ..; y n À ŷn j jÞ: ð5Þ According to Eq. ( 5), n shows the number of samples, y i is the experimental or actual Bitcoin values, and ŷi is the predicted values of the proposed method.
Determination coefficient or detection coefficient (R 2 ) This criterion calculates how much the predicted values of the proposed method have a good agreement with the actual values of Bitcoin.In contrast to other criteria, the better it is to one.This criterion is calculated through Eq. ( 6).
Accordingly, y i represents the variables mean, n denotes the number of samples, the experimental or actual Bitcoin values are shown by y i , and ŷi demonstrates the predicted values of the proposed method.

VADER-based sentiment analysis
A sentiment analysis aims to identify and extract users' opinions (Cambria et al. 2018).The primary aim of this section of the proposed method is to analyze the feelings of users' tweets.A variety of methods have been proposed for sentiment analysis, among which the VADER method is one of the successful methods in the field of sentiment analysis.As a matter of fact, VADER is a tool or library based on words and roll that can extract sentiments from text, emoticons, emojis, abbreviations, and terms accurately (Hutto and Gilbert 2014;Hota et al. 2021).This tool has a better speed due to its vocabulary and roll, and its output is a four-dimensional vector in which positive, negative, neutral, and compound values are generated for each input text.It should be noted that the positive, negative, and neutral values are normally considered between zero and one.Therefore, in the proposed method, the tweet

Data collection
This section provides necessary information regarding the data used for analyzing the problem and the proposed models.

Data set
In this part of the proposed method, four types of data, including information associated with news, tweets, Google Trends, and Bitcoin stocks, have been collected by different methods and through API, which is examined and presented in the following.The whole information was obtained daily from ''05/02/2021'' to ''10/09/2021'' for each one.
Bitcoin information The yfinance library is exploited to extract Bitcoin stock features, including open, close, high, low, volume, and price.Bitcoin stock information was extracted daily from ''05/02/2021'' to ''10/09/2021.''Therefore, four Bitcoin features and a close feature are considered as real values for forecasting at this phase.Table 1 highlights an overview of this feature.
Tweet information The Twitter API was employed to collect tweets extracted daily from ''05/02/2021'' to ''10/ 09/2021.''The collected data includes 1.2 million tweets related to the word BTC and Bitcoin.Finally, tweet information is grouped daily.In addition to the tweets' texts, the meta feature of the users is also collected at this step.Meta tweet information includes total followers, average followers, and so on, whose exact information is illustrated in Table 3.
Google trends information The ranking of the two words ''Bitcoin'' and ''BTC'' were extracted using the pay trends library at this step.The information of this step was also extracted daily from ''05/02/2021'' to ''10/09/2021.''More detailed information regarding these features is given in Table 3.
News headline and text information In this step, the text and news related to Bitcoin were extracted from reputable sites like Coin telegraph using the Beautiful Soup and urllib libraries.Besides, each of the news headlines and text was extracted separately.In the next step, they were preprocessed, and then the TFIDF method was used to extract the effective features or words.

Text preprocessing and textual feature extraction
At this step of the proposed method, a series of preprocessing operations, including data cleaning, tokenization, stop word removal, and steaming, is applied to any tweet and textual news data.In natural language processing, algorithms do not have any understanding of the text; thus, the first and most important step is to identify or separate the words (signs and words), which is the task of the tokenization step (Jurafsky 2000; Manning et al. 2014).The next step is to eliminate the stop words that are actually repetitive words in the text without any information and are only used to connect the words in the sentence (Rani and Lobiyal 2018).Stemming is the last step that needs to be performed in the preprocessing phase.In fact, the stem refers to the main meaning and concept of the word.Thus, a limited number of stems are formed in natural language, and the rest of the words are extracted from these stems (Xu and Croft 1998;Porter 1980).Stem's major aim is to extract the stem and remove the affixer attached to the word (Manning and Schutze 1999;Porter 2001).Thus, stemming is one of the main steps in natural language processing that must take place.Therefore, the steps of the word processing are given step by step below: Data cleaning step In this step of the proposed method, the blank textual data, numerical data, link address, and so on are eliminated from the textual news and tweets to prepare the text for the next text processing steps.
Tokening step In this step of the proposed method, unifying or tokenizing the sentences in each film is conducted.
Stop word removal step In this step of the proposed method, stop word removal is conducted using nltk library and English Porter Stemmer.Tokenization step In this step of the proposed method, word stemming has been conducted using the nltk library and English Porter Stemmer.
After preparation, the Tweet data are sent to VADER for examination the sentiment analysis.Nevertheless, the textual news data in this paper is characterized by the TFIDF extraction method.F-IDF is known as a method to convert text to numerical values based on the importance of the words.This type of weighting is based on the belief that the words that distinguish a document from other headlines and news content are important words and, thus, have more weight (Salton and Buckley 1988).According to Eq. ( 7), in this type of weighting, the importance of words is measured based on the number of repetitions in the headlines and the news content, and the whole documents in the content (data set).
where TFIDF t i ; d j À Á examines the significance of the word based on the headline and the news content, and IDF t i ð Þ calculates the significance of the word based on the headline and the news content including that word.

Data normalization
One of the crucial steps in preprocessing or preparing data sets in machine learning and deep learning algorithms is normalization and standardization methods.Normalization is conducted to scale the data values in a specific range of values.Most machine learning algorithms and deep data normalization more accurately predict prices.The minmax normalization method is one of the scaling methods that are significantly popular and causes the data to be in the range between [0,1], which can be defined as follows: According to Eq. ( 8), X min represents the lowest value in a feature of the Bitcoin data, X denotes the value of each feature in Bitcoin data, and X max indicates the maximum value in each feature of the Bitcoin data.

Proposed models based on deep learning
The first part of the proposed model revealed that various data, including metadata tweets, sentiment tweet data (VSdata), news title data, news content data, Bitcoin data, and Google Trends data, have what type of features as shown in Table 3, due to the various types of data collected in this paper.Nine different deep learning models have been designed according to the input data type in this part of the proposed method.As can be seen in the study, we considered nine architectures for analysis.In order to present the most efficient model for predicting Bitcoin price, the models could be more varied, while a limited number of models can be discussed in the paper.We could not examine more models in the paper due to the limitations of publication and standards of a paper.Each model may have a different layer depending on the input data type, such as the type of text.In addition, several models are designed based on the whole features and selecting the essential features.This selection is based on the different features, including mutual-info-regression, linear regression, and correlation, and finally, a model is designed based on the combination of mutual-info-regression, linear regression, and correlation.Most of the models in this section are intended to indicate the impact of each data separately on the Bitcoin prediction.Combining the whole existing features in the whole data makes it possible to specify how effective these features are.In Sect.6, the comparison results of the different criteria imply which models with which features have managed to predict Bitcoin more accurately.

The proposed Model-1 with Bitcoin data
In this model, a deep network based on convolutional layers and LSTM layers is designed with Bitcoin data input, as shown in Fig. 2, which is called Model-1 in this paper.In this model, the Bitcoin stock data, including open, close, high, low, volume, and price, is only considered to predict the Bitcoin price.The first model is composed of different layers, including three conv1-d layers, two maxpooling layers, a flattened layer, a dense layer, and an LSTM layer.
The proposed Model-1 architecture based on convolutional layers and Bitcoin data in Fig. 3 shows that the convolutional layer is used to extract the better feature.Accordingly, the LSTM layer is utilized to maintain the temporal state of the data.Conv1-d with 500, 200, and 100 filters in this model, two max-pooling layers with two kernels, one LSTM layer with 32 units, and one dense layer with 20 units are set.Besides, the activation function is set with Relu except for the last layer, and the last layer is set according to the data type of the Sigmoid activation function.Notably, details of the loss function and the number of epochs and other hyper-parameters of Model-1 are given in Sect. 5.

The proposed Model-2 with metadata
This model presents a deep network based on convolutional layers and dense layers with data input of metadata tweets.According to Fig. 3, this model is called Model-2 in this paper.In this model, the metadata tweets include userfollowers-sum, user-followers-mean, user-friend-sum, user-friend-mean, user-favorites-mean, user-verified-most, and user-verified-mean are considered to predict Bitcoin prices.The second model consists of different layers, including three conv1-d layers, two max-pooling layers, a flattened layer, a dense layer, and a dropout layer.
The proposed Model-2 architecture based on convolutional and metadata layers in Fig. 3 shows that this model uses a convolutional layer to extract better features.Also, dense and dropout layer overfit problems are considered for better network training.Conv1-d with 500, 200, and 100 filters in this model, two max-pooling layers with two kernels, a dropout layer with 0.1%, and two dense layers with 100 and 20 units are set.Also, in this model, the activation function is set with the Relu except for the last layer, and the last layer is set according to the data type of the Sigmoid activation function.In addition, more details regarding the loss function and the number of epochs, and other hyper-parameters of Model-2 are presented in Sect.6.

The proposed Model-3 with VADER data
In this model, a deep network is designed based on convolutional and dense layers with data input of sentiment analysis of tweet text with Bitcoin data.As shown in Fig. 4, this model is called Model-3 in this paper, in which the sentiment analysis data of tweet text, including positive, negative, neutral, and compound values obtained from the VADER tool, are only used to predict Bitcoin prices.The third model consists of different layers, including three conv1-d layers, two max-pooling layers, a flattened layer, and a dense layer.
The proposed Model-3 architecture based on convolutional layers and sentiment analysis data of tweet text and Bitcoin data in Fig. 4 demonstrates that this model uses a convolutional layer to extract better features and uses the  Dense layer for the linear state.Conv1-d with 500, 200, and 100 filters in this model, two max-pooling layers with two kernels, and a dense layer with 100 units are set.Also, the last two-layer activation function is set with Relu, and the last layer is set according to the data type of the Sigmoid activation function.In this model, two leakyrelu activation functions are used after max-pooling layers.Notably, more details concerning the loss function and the number of epochs and other hyper-parameters of Model-3 are presented in Sect.6.The architecture of the proposed model-4 is based on a deep two-channel model with Bitcoin data and metadata tweets.Figure 5 indicates that this model has used the dense layer and two-channel state to predict the Bitcoin price better.This model sets Dense layers with 500, 300, 200, and 100 units in the first channel.Dense layers with 200 and 100 units and a concatenate layer are set in the second channel.In addition, a concatenate layer is set from a dropout layer with 0.1%, and a dense layer with 20 is used.Also, the activation function is set with Relu except for the last layer, and the last layer is set according to the data type of the Sigmoid activation function.Besides, more details about the loss function, the number of epochs, and other hyper-parameters of Model-4 are given in Sect.6.In contrast to the third model, sentiment analysis is not considered.
The proposed architecture of Model-5 based on convolutional layers with tweet textual data and the embedding layer is shown in Fig. 7, in which the convolutional layer and the embedding layer are used to extract better features.Also, the dense and dropout layers are used to tackle the    The proposed architecture of Model-9 with the various data and combinations of three feature selection models is presented here.Accordingly, the whole three feature selection methods are used in this model to predict all data more accurately and combine feature selection features.In this model, the advantages of three feature selection models are exploited to select important features and remove additional ones.Conv1-d with 500, 200, and 100 filters in this model, two max-pooling layers with two kernels and a dense layer with 100 units are set.Notably, the activation function is set with Relu except for the last layer, and the last layer is set according to the data type of the Sigmoid activation function.Moreover, the necessary details regarding the loss function, the number of epochs, and other hyper-parameters of Model-3 are presented in Sect.6.

Evaluation and validation
This section examines the nine proposed models based on convolutional neural network learning and LSTM for predicting Bitcoin prices.The proposed model has been implemented and developed in the Google Columbine environment with 12 GB RAM and TensorFlow and Keras libraries.The TensorFlow library is one of the most widely used and popular neural network learning libraries in Python programming language that researchers and companies also exploit to create a variety of neural network architectures.In the experiment, the price of Bitcoin with a window length of 1 was predicted due to the availability of  the inputs for 78 days.Also, 80% of the data was considered for learning and 20% for the experiment.Some of the parameters of the proposed models were introduced in Sect.3. Table 4 indicates the hyper-parameters of each proposed model for implementation.
According to Table 4, in order to make a fair comparison between these values, the whole models are set with the same optimizer and loss.In this section, nine proposed models based on the convolutional neural networks learning and LSTM are compared for predicting Bitcoin price in terms of various criteria, including MSE, RMSE, MAE, MedAE and determination coefficient (R 2 ).The first experiment for the loss function of the whole nine proposed models is based on the learning and test data shown in Figs. 9 and 10.
According to Fig. 9, the comparison results of the proposed models in terms of loss function for the learning data show that the eighth and ninth models have better results than the other models, and also, the second and third models have the worst performance in terms of the loss function.Notably, some models, such as the second model, are mainly considered to show the direct effect of words on the Bitcoin price.It should be noted that most algorithms can be optimized well on the learning data.Concerning the experimental data, the point is which model can have the best performance.In the following, the proposed models are compared in terms of loss function on the experimental data for more evaluation.
As shown in Fig. 10, the comparison results of the proposed models in terms of loss function on the experimental data highlight the fact that the eighth and ninth models performed better than the other models in the experimental data.However, some models, like the fifth model, could not perform better in the learning data since they only used sentiment analysis.Besides, Models 2, 5, and 7 have acceptable performance in the experimental data.It is worth mentioning that there is the direct interference of metadata tweets in predicting the Bitcoin price in Model-9 and the lack of interference of other features.Then, the second experiment is compared to the proposed models in terms of various criteria, including MSE, RMSE, MedAE, and determination coefficient (R 2 ), as shown in Table 5.
Table 5 gives the necessary information regarding the comparison made between the different proposed models in terms of different criteria.The ninth model, with a value of 0.001, has obtained the best result in terms of MSE.Also, this model has a better performance compared to other models in the MAE and R 2 criteria with values of 0.02 and 0.98, respectively.Also, in terms of MSE criteria,  the second model with a value of 0.04 obtained the worst result.Compared to other models, this model has performed worse in terms of MAE and R 2 , with values of 0.17 and 0.4190, respectively.Notably, based on the sixth, seventh, and eighth models in these feature selection methods, it can be concluded that the two Model-6 and Model-8 have shown much better performance.Thus, the value of MSE for Model-6 is equal to 0.00258, and MSE value for Model-8 is equal to 0.00251.These results imply the fact that the feature selection of the two methods of mutual-info-regression and linear regression identified the important features correctly.Additionally, the fourth model in which VADER sentiment analysis is used has improved the whole criteria compared to the first model without   . 11, 12, 13, 14, 15, 16, 17, 18 and 19.The prediction diagram of the proposed models is presented on the whole Bitcoin data.This forecast was based on the loss function, and with respect to the comparison results of the criteria in Table 5, it can be concluded that the ninth model has a better performance than other models in terms of the accuracy of the price prediction.Notably, after that, the sixth and eighth models better predict the Bitcoin price.

Fig. 1
Fig. 1 Flowchart of the proposed method based on multiple feature extraction and deep learning

Fig. 2
Fig. 2 Proposed Model-1 architecture is based on the convolutional layer and LSTM layer with Bitcoin data

Fig. 4
Fig. 4 Proposed Model-3 architecture based on convolutional layers with sentiment analysis data of tweet text

5. 4
The proposed Model-4 with meta 1 Bitcoin data This model presents a deep two-channel dense full-layer network with Bitcoin data input and metadata tweets.As shown in Fig.5, this model is named Model-4 in this paper.In contrast to other models, the proposed model has two input channels in which metadata tweets are used in the first channel, including user-followers-sum, user-followersmean, user-friend-sum, user-friend-mean, user-favoritesmean, user-verified-most, and user-verified-mean as input.Notably, the stock data, including open, close, up, down, volume, and price, are used in the second channel to predict the Bitcoin price.Finally, the two channels are combined by the concatenate layer.

5. 5
The proposed Model-5 with textual tweet data and embedding layer As regards Fig.6, a deep network based on convolutional layers with tweet textual data and an embedding layer is considered in this model.This model is named Model-5 in this paper.In this model, the textual tweet data are directly used, and then the preprocessing operation with the embedding input layer is considered.The fifth model consists of an embedding input layer from other layers, including three conv1-d layers, two max-pooling layers, a flattened layer, two dense layers, and a dropout layer.The major aim of this model is to use words and sentences directly in the text of the tweet to predict the Bitcoin price.

Fig. 5
Fig. 5 Proposed architecture of deep two-channel Model-4 with meta ?Bitcoin data

Fig. 6
Fig. 6 Model architecture proposed Model-5 with text tweet data with an embedding layer

Fig. 8
Fig. 8 Proposed model architecture of Model-9 with the various data and a combination of three feature selection models

Fig. 9
Fig. 9 Comparison of proposed models in terms of loss function for the learning data

Fig. 10
Fig. 10 Comparison of proposed models in terms of loss function on the experimental data

Fig. 11
Fig. 11 Bitcoin price prediction based on the first proposed model

Fig. 12
Fig. 12 Bitcoin price prediction based on the second proposed model

Fig. 13
Fig. 13 Bitcoin price prediction based on the third proposed model

Fig. 14
Fig. 14 Bitcoin price prediction based on the fourth proposed model

Fig. 15
Fig. 15 Bitcoin price prediction based on the fifth proposed model

Fig. 16
Fig. 16 Bitcoin price prediction based on the sixth proposed model

Fig. 17
Fig. 17 Bitcoin price prediction based on the proposed seventh model

Fig. 18
Fig. 18 Bitcoin price prediction based on the proposed eighth model

Table 1
Summary of the criteria for evaluating the proposed method

Table 2 .
VADER-based sentiment analysis is performed in the proposed method according to Table2, and finally, each of the positive, negative, neutral, and compound values is selected as a final feature.These features are examined in the feature selection step by feature selection methods in terms of effectiveness.They would be considered in the final feature selection list if they were important.

Table 2
Way of analyzing VADER-based sentiments in the proposed method

Table 3
Different extraction features in the proposed method According to Table 1, the whole features are combined if the minimum number of text features and headlines is 100 features.These models contain at least 115 features, among

Table 4
Validation of hyper-parameters of proposed models

Table 5
Comparison of different proposed models in terms of the various criteria