A new intuitionistic fuzzy time series method based on the bagging of decision trees and principal component analysis

Intuitionistic fuzzy time series methods provide a good alternative to the forecasting problem. It is possible to use the historical values of the time series as well as the membership and non-membership values obtained for the historical values as effective factors in improving the forecasting performance. In this study, a high order single variable intuitionistic fuzzy time series reduced forecasting model is first introduced. A new forecasting method is proposed for the solution of the forecasting problem in which the functional structure between the historical information of the intuitionistic time series and the forecast is obtained by bagging of decision trees based on the high order single variable intuitionistic fuzzy time series reduced forecasting model. In this study, intuitionistic clustering is employed to create intuitionistic fuzzy time series. To create a simpler functional structure with Bagging of decision trees, the input data from lagged variables, memberships, and non-membership values are subjected to dimension reduction by principal component analysis. The performance of the proposed method is compared with popular forecasting methods in the literature for ten different time series randomly obtained from the S&P500 stock market. According to the results of the analyses, the forecasting performance of the proposed method is better than both classical forecasting methods and some popular shallow and deep neural networks.


Introduction
Traditional forecasting methods work based on obtaining linear functions of lagged variables or deterministic functions of time.In recent years, it has become popular to modelling of time series using various membership values obtained by fuzzification in addition to historical values.Moreover, in the last five years, fuzzy time series forecasting methods have been replaced by forecasting methods based on intuitionistic fuzzy sets.Memberships and their nonlinear transformations can be used as explanatory variables in time series analysis.Generally, in modelling studies, nonlinear transformations of inputs can be used for data augmentation and contribute to modelling performance.The fuzzy set theory was proposed by Zadeh (1965).The fuzzy set is a more generalized version of the classical set and presents an approach that enables membership values to the set with a certain degree of membership and non-membership.In a fuzzy set, the sum of the degree of membership and non-membership of an element of the universal set to a set is one.Fuzzy sets have been used for different purposes in the literature.Chen and Chen (2001), Chen et al. (2009), Chen and Niou (2011) and Shen et al. (2013) used fuzzy sets in their studies.
In Atanassov (1986), a fuzzy set is made more generalized, and an intuitionistic fuzzy set is defined.In intuitionistic fuzzy set theory, the sum of the degrees of membership and non-membership for an element of the universal set can be less than one and the idea of hesitation degree emerges.In intuitionistic fuzzy sets, the degree of hesitation increases when the values of membership and non-membership approach each other and when there is no certainty about the belonging of a universal set element to a set.
There are many studies in the literature where fuzzy and intuitionistic fuzzy sets are handled in various ways to solve the forecasting problem.Song and Chissom (1993a, b) are important studies in which a forecasting method is developed based on fuzzy sets.In these studies, fuzzy time series are defined and a method for obtaining forecasts for fuzzy time series is presented.Fuzzy setsbased forecasting methods have become more effective with the use of different tools and the subjective decisions and biased evaluations in the methods in the early years have been moved away.Chen and Wang (2010) and Chen and Jian (2017) used fuzzy sets for forecasting purposes in their algorithm.
Intuitionistic fuzzy sets have been used for different purposes in the literature.Chen and Randyanto (2013), Chen et al. (2016), Chen and Wang (2010), Zou et al. (2020), Liu et al. (2020), Kumar and Chen (2021) and Kumar and Chen (2022) used intuitionistic fuzzy sets for creating decision-making methods.Gangwar and Kumar (2014) first proposed an intuitionistic fuzzy time series method.In addition, fuzzy and intuitionistic fuzzy time series forecasting models are defined by Egrioglu et al. (2019).Many methods have been proposed in the literature for intuitionistic fuzzy time series forecasting.Gangwar and Kumar (2014) used probabilistic and intuitionistic fuzzy sets to create a forecasting method.Kumar and Gangwar (2015) proposed an intuitionistic fuzzy time series forecasting method that uses intuitionistic fuzzy logical relations.Lei et al. (2016) proposed a multifactor high-order intuitionistic fuzzy time series forecasting model.Wang et al. (2016) merged two topics intuitionistic fuzzy time series and fuzzy reasoning.Fan et al. (2016), Fan et al. (2017) and Bisht et al. (2018) introduced new forecasting methods based on intuitionistic and hesitant fuzzy sets.Abhishekh et al. (2018) used a score function in an intuitionistic fuzzy time series method.Kumar et al. (2019) proposed an intuitionistic fuzzy time series forecasting based on a dual hesitant fuzzy set.Fan et al. (2020) proposed a network traffic forecasting model based on long-term intuitionistic fuzzy time series.Bas et al. (2021) proposed an intuitionistic fuzzy time series functions approach for time series forecasting.Kocak et al. (2021) proposed a method based on an intuitionistic fuzzy set and long short-term memory deep neural network.Pant et al. (2021) proposed a novel method to optimize interval length in intuitionistic fuzzy time series forecasting models.Chen et al. (2021) used a new data transformation method in the intuitionistic fuzzy time series method.Pant and Kumar (2022) composed particle swarm optimization and intuitionistic fuzzy sets in their methods.Bas et al. (2022) used a pi-sigma artificial neural network in an intuitionistic fuzzy time series method.Arslan and Cagcag Yolcu (2022) used a sigma-pi neural network for defining fuzzy relations based on intuitionistic fuzzy sets.Pant et al. (2022) forecasted death due to COVID-19 in India by using intuitionistic fuzzy sets.Nik Badrul Alam et al. (2022) predicted Malaysian crude palm oil prices based on intuitionistic fuzzy sets.Vamitha and Vanitha (2022) suggested a method based on intuitionistic fuzzy sets for forecasting temperature.Cagcag Yolcu and Yolcu (2023) proposed an intuitionistic fuzzy time series forecasting method based on a cascaded neural network.Dixit and Jain (2023) are interested in non-stationary time series by using an intuitionistic fuzzy time series forecasting method.
A new forecasting method based on intuitionistic fuzzy sets and bagged decision trees is proposed in this paper.The motivation of the study is to obtain a method with high forecasting performance for intuitionistic fuzzy time series using the bagging of decision trees.The contribution of the study is to propose an intuitionistic fuzzy time series method using the bagging of decision trees for the first time in the literature.The intuitionistic fuzzy c-means method is used to create an intuitionistic fuzzy time series.In addition, in this study, principal component analysis is used for dimension reduction for the inputs of the model.
Moreover, a high order single variable intuitionistic fuzzy time series reduced forecasting model is first introduced in this study.In the second part of the study, descriptive information about decision trees and the bagging of decision tree methods is given.In the third section, the proposed method is given by helping algorithms.In the fourth section, the details of applications on the stock markets are presented.

Decision trees and bagging of decision trees
Decision trees (DTs) are machine learning algorithms that can perform classification, regression and forecasting tasks.Combining many trees often yields satisfactory results in prediction accuracy.Regression trees provide an approximation to a real-valued function for a regression task.Regression trees are constructed using a process known as binary recursive partitioning.This iterative process first divides the training dataset into partitions or branches and continues to split each partition into smaller groups as it moves up each branch.Bootstrap sampling is an effective and at the same time practical resampling method, which requires minimal assumptions about the dataset, where the sample size is small, or it is not possible to collect new data.The main purpose of Bootstrap methods or algorithms is to generate large data sets and resample from the data.Bagging, first proposed by Breiman (1996), is short for Bootstrap AGGregatING.Bagging is a procedure for reducing the variance of a statistical learning method and can be used for classification and regression.Suppose a sample x 1 ; x 2 ; . ..; x n with n observation is given and let f ðxÞ be the unknown distribution of the sample.The aim is to sample from this distribution, but f ðxÞ is unknown.Approximating the sampling from the distribution f ðxÞ by randomly sampling from the sample x 1 ; x 2 ; . ..; x n , is named bootstrapping (Ghojogh and Crowley 2019).In bootstrapping, simple random sampling with replacement is used and the drawn sample is named the bootstrap sample.
In bagging, k bootstrap samples are drawn, and then the model h j , for all j 2 f1; :::; kg, is trained by considering the j-th bootstrap sample.Hence, there are k-trained models rather than only one model.Finally, the results of estimations of the k models are aggregated for any given obser- To employ bagging to regression trees, some K regression trees are grown on K bootstrapped training sets, and the resulting estimates are averaged.Since these trees are deeply grown, they are not pruned.Therefore, each tree has a high variance but low bias.Averaging the predictions of these K trees reduces the variance.

The proposed forecasting method is based on decision trees and intuitionistic fuzzy sets
A new algorithm based on intuitionistic fuzzy sets and decision trees is proposed for forecasting purposes.The proposed method uses Chaira (2011) clustering method in the fuzzification stage.It also uses the bagging of the decision tree method for fuzzy relationship determination.
Since the target values of the decision trees are designed as defuzzified forecasts, fuzzy relationship determination, and defuzzification processes are performed together.In the proposed forecasting model, the principal component analysis algorithm is used to reduce the dimension of the inputs.The intuitionistic fuzzy time series is defined as given in Definition 1 by Egrioglu et al. (2019).As can be seen, intuitionistic time series can be defined as a type of multivariate time series.
Definition 1 Y t is a time series and B 1 ; B 2 ; . ..; B c are intuitionistic fuzzy sets.Intuitionistic fuzzy time series ðIF t Þ can be defined as a multivariate time series as follows: where l B j t ð Þ, m B j t ð Þ are membership and non-membership values.
In addition, Egrioglu et al. (2019) used the ''high order single variable intuitionistic fuzzy time series forecasting model'' given in Definition 2 to obtain intuitionistic fuzzy time series forecasts.
Definition 2 The high-order forecasting model can be defined for IF t in Definition 1 as follows: It is assumed that G is a function and e t is an error term and its mean is zero.
The method proposed in this study works according to the Definition 3 forecasting model given for the first time in the literature.While the number of inputs of the forecasting model is ðc þ 1Þp in Definition 2, the number of inputs decreases to q\ðc þ 1Þp in Definition 3.
Definition 3 (High order single variable intuitionistic fuzzy time series reduced forecasting model) The model given in Eq. ( 3) is reduced forecasting model is introduced in Eq. (4).
where G is estimated by using the bagging of the decision tree in the proposed method.In Eq. ( 4), z i variables are principal components and they are obtained from principal component analysis with Eq. ( 5).
A new intuitionistic fuzzy time series forecasting method for the application of the forecasting model given in ( 4) is given below as an algorithm.
Step 1.The parameters of the method are determined.These parameters are listed below.
Step 3. Calculate root of mean square error (RMSE) values for the validation set by repeating Steps 3.1-3.4for all c 2 ½c 1 ; c 2 and p 2 ½p 1 ; p 2 .
Step 3.1.Membership (u Ã ik Þ, non-membership (v Ã ik Þ and hesitation degrees ðp ik Þ are obtained by applying the intuitionistic fuzzy c-means (IFCM) method to the training data.In the application of IFCM, firstly the memberships are randomly generated.Then hesitation degrees and intuitionistic membership values are calculated with Eqs. ( 10) and (11), respectively.In ( 10) and ( 11), u ik is the Initial membership value, p ik is the hesitation degree value and u Ã ik is the intuitionistic membership value.
While the cluster centres are calculated or updated with formula (12), the heuristic membership values are updated with formula (12).Formulas ( 12), ( 13) and ( 14) are applied successively until the change in membership is sufficiently reduced.
where x k is the kth data in the training data and m is the fuzziness index.Non-membership values are calculated by the Eq. ( 14).
Step 3.2.Input and target value matrices are created separately for each set using training, validation and test sets.
Step 3.3.The principal component analysis is applied to the ( 15) training data matrix.
Step 3.4.The bagging of the decision tree is trained using reduced input data or principal component score matrix as input data and (19) target data.
Step 4. The hyperparameter values p and c with the smallest RMSE Validation are selected as the best set of hyperparameters (p best and c best ).
Step 6.The bagging of the decision tree is retrained 30 times by using principal component scores which are calculated in Step 5 and (34) equations and different random initial decision tree parameters.In this step, methods are applied given in Sect. 2.
Step 7. Test set RMSE values are calculated for each repetition.
Step 8.The mean and standard deviation of the

RMSE j
Test values are calculated to evaluate the performance of the method and compare it with other prediction methods.

Applications
In the application, a total of 10-time series with 500 and 250 observations taken randomly for the S&P500 stock market index opening prices time series between 2014-2018 are used.The information about the time series used in the application is summirized in Table 1.
The time series given in Table 1 are analysed by long short-term memory (LSTM) proposed by Hocreiter and Schmidhuber (1997), pi-sigma artificial neural network (PSGM) proposed by Shin and Gosh (1991), bootstrapped hybrid artificial neural network (B-HANN) proposed by Egrioglu and Fildes (2022), single multiplicative ANNbased fuzzy time series method (FTS-SMN) proposed by Aladag (2013) and the proposed method (IFTS-TREE).In addition to machine learning methods, some classical methods are also applied to all series.In the implementation of all methods, the same data segmentation and hyperparameter selection method was used as the proposed method.The statistical results and hyperparameter values are shown in Tables 2 and 3, respectively.
The mean statistics results obtained for all methods and the results of the classical methods are compiled in Table 4 and the success percentages of the methods are calculated and given in this table.

Conclusion and discussions
The main contribution of the study is to propose an intuitionistic fuzzy time series method using the bagging of decision trees for the first time in the literature.The membership values are obtained from a clustering method.The PCA is employed for dimension reduction for the inputs of the model.A high order single variable intuitionistic fuzzy time series reduced forecasting model is first introduced in this study.
The proposed method is compared with popular machine learning, fuzzy time series and classical forecasting methods using the S&P500 stock market time series.In 50% of the time series, the proposed method is found to have the lowest RMSE values calculated for the test set.Moreover, the proposed method has the lowest mean rank value and this means it has the best forecasting performance.According to the findings obtained in this study, the bagging of decision trees as a fuzzy relationship identification or modelling tool in fuzzy time series can lead to a successful forecasting performance.In future studies, the proposed method can be improved by using picture fuzzy sets.
In addition, for the proposed method to work in the case of outliers, it is possible to make the decision trees used robust in such a way that they are not affected by outliers and eventually, the proposed method can be transformed into a robust method that is not affected by outliers.
½c 1 ; c 2 : The intervals for the number of intuitionistic fuzzy sets ½p 1 ; p 2 : The intervals for the number of lagged variables n : The number of total observations for time series n train : The number of observations for the training set n val : The number of observations for the validation set n test : The number of observations for the test set Step 2. The observations of the time series are divided into three blocks training, validation and test sets.Let x 1 ; x 2 ; . ..; x n f gdenote the observation values of the time series.In this case, the training, validation and test sets are as follows in block structure.

Table 1
Number of observations, start and end dates of the time series used in the application

Table 2
Statistical results obtained for the test set of all methods

Table 3
The best hyperparameter values obtained for all methods

Table 4
The mean statistics results obtained for all methods