Fuzzy transfer learning in time series forecasting for stock market prices

Transfer learning involves transferring prior knowledge of solving similar problems in order to achieve quick and efficient solution. The aim of fuzzy transfer learning is to transfer prior knowledge in an imprecise environment. Time series like stock market data are nonlinear in nature, and movement of stock is uncertain, so it is quite difficult following the stock market and in decision making. In this study, we propose a method to forecast stock market time series in the situation when we can use prior experience to make decisions. Fuzzy transfer learning (FuzzyTL) is based on knowledge transfer in that and adapting rules obtained domain. Three different stock market time series data sets are used for comparative study. It is observed that the effect of knowledge transferring works well together with smoothing of dependent attributes as the stock market data fluctuate with time. Finally, we give an empirical application in Shenzhen stock market with larger data sets to demonstrate the performance of the model. We have explored FuzzyTL in time series prediction to understand the essence of FuzzyTL. We were working on the question of the capability of FuzzyTL in improving prediction accuracy. From the comparisons, it can be said fuzzy transfer learning with smoothing improves prediction accuracy efficiently.


Introduction
Stock market is unpredictable as there are several complex factors influencing its movements. Therefore, the trend of the series is also affected by those factors and by their nonlinear relationship. In stock market forecasting, technical analysis is one of the traditional methods applied by investors for decision making. There are some other statistical methods such as autoregressive conditional heteroscedasticity (ARCH) model (Engle 1982), generalized ARCH (GARCH) model (Bollerslev 1986), autoregressive moving average (ARMA) model (Box and Jenkins 1976), autoregressive integrated moving average (ARIMA) model (Box and Jenkins 1976). All these models are different types of regression models assuming some mathematical distribution, and those distributions are not always followed by realistic stock market time series data. Nowadays, several data mining approaches like evolutionary algorithms, artificial neural networks, fuzzy logic, rough set theory, and their hybridization have been developed. All these approaches perform well in forecasting of stock markets. Backpropagation neural network has been used to find the fuzzy relationship in fuzzy time series (Hurang and Yu 2006). A hybridized genetic algorithm and neural network model have been developed to predict stock price index (Nikolopoulos and Fellrath 1994). Caia et al. (2013) proposed a hybrid GA model based on fuzzy time series together with genetic algorithm (FTSGA) working on TAIEX as experimental data set and concluded that the model improved the accuracy. Teoh et al. (2009) proposed a hybrid model based on multi-order fuzzy time series by using rough sets theory to mine fuzzy logical relationships from time series and an adaptive expectation model to improve forecasting accuracy on TAIEX and National Association of Securities Dealers Automated Quotations (NASDAQ) experimental data sets. Pai and Lin (2005) developed a hybrid ARIMA and support vector machine model in stock price forecasting.
There are several pieces of work on fuzzy time series. A novel method on fuzzy time series had been proposed by Wang and Mendel (1992) involving fuzzification of real data set, rule generation, rule reduction to reduce rule redundancy, defuzzification to real value, and finally forecasting from the analysis. The model had been applied on truck backer-upper control, Mackey-Glass time series prediction.
Traditional data mining technologies are not capable of handling the information when there is a time gap in their collection periods. Traditional data mining does not consider the domain transfer concept when inferencing from information hidden in data. Transfer learning is the technology which works in situations like domain difference. It has been successfully applied to several areas of application like classification problems (Blitzer et al. 2007;Wu and Dietterich 2004), collaborative filtering problems (Li et al. 2009), graph-based method on identifying games (Kuhlmann and Stone 2007). Fuzzy logic-based transfer learning is the new technique which works on information with uncertainty, or, in imprecise environment. Shell and Coupland (2015) proposed a fuzzy logic-based transfer learning (TL) to form a prediction model in Intelligent Environments (IEs). They compared their result with the method proposed by Wang and Mendel (1992).
We are motivated by the concept of transferring knowledge in a similar application domain. We have learnt from the application of FuzzyTL prediction model by Shell and Coupland (2015) in Intelligent Environments and applied our obtained knowledge in the stock market time series, which is a bit different from IEs where the data sources are sensors. But in both cases FuzzyTL has been applied for prediction purpose. So, we adapted this concept as the purpose of both the use cases is prediction. The main contributions of this paper are: (1) Multi-attributes of stock market data are represented as labeled and unlabeled data, (2) fuzzy transfer learning (FuzzyTL) is yet not explored in the time series forecasting, so it has been explored in time series of stock market price with modification, (3) FuzzyTL has been applied on three different stock market time series for results in comparison with the model proposed by Wang and Mendel (1992), Chen (1996), andCheng et al. (2009), and(4) there is a case study on Shenzhen stock price with a larger data set.
The remainder of this paper is organized as follows. First, there is a brief discussion on transfer learning in Sect. 2. In Sect. 3, we introduce fuzzy transfer learning. Our proposed approach is presented in Sect. 4. Data sets descriptions are given in Sect. 5. Results and discussion are presented in Sect. 6. A case study on Shenzhen stock price is elaborated in Sect. 7. In Sect. 8, we have summarized our work and discussed limitations of it. Finally, our conclusions and future research directions are presented in Sect. 9. The appendix is given thereafter.

Basic of transfer learning (TL)
Transfer learning (TL) contains two principle elements, a Domain and a Task. Pan and Yang (2009) defined Domain as a pair of two components: a feature space X and a marginal probability distribution P(X) where X = x 1 , x 2 , . . . , x n and Task as a pair of two components: a label space Y = y 1 , y 2 , . . . , y n and a predictive function f (.). The predictive function can be learned from the training data which is a pair (x i , y i ), x i ∈ X and y i ∈ Y . The source domain can be defined as D s = (x s 1 , y s 1 ), (x s 2 , y s 2 ), . . . , (x s n , y s n ), where x s ∈ X is the input data point and y s ∈ Y is the corresponding label. The task domain can be defined as D t = (x t 1 , y t 1 ), (x t 2 , y t 2 ), . . . , (x t n , y t n ), where x t ∈ X is the input data point and y t ∈ Y is the corresponding output.
Transfer learning (TL) can be defined as: (Pan andYang (2009), Torrey andShavlik (2009)) Given a source domain D s and a learning task T s , a target domain D t and a learning task T t , objective of the TL process is to improve the learning of a new task T t through the transfer of knowledge from previously acquired knowledge or task T s by the learning of the predictive function f t (.) When the source and target domains are the same (D s = D t ) and their learning tasks are also same (T s = T t ), then the learning problem becomes a traditional machine learning problem. Based on the application point of view, TL techniques can be classified into four categories such as (a) neural network in transfer learning, (b) Bayesian techniques in transfer learning, (c) fuzzy logic in transfer learning, and (d) transfer learning with evolutionary algorithms (Lu et al. 2015).
In this work, we are focusing on fuzzy transfer learning in stock market time series analysis and prediction. It is discussed briefly in the following section.

Fuzzy transfer learning (FuzzyTL)
FuzzyTL is the combination of fuzzy logic (FL) and TL to bridge the knowledge gap by the learning process and an adaptation from the learning process of one context to another. The change of context may be due to change of domain, missing information, change of situation, etc. TL has the ability to transfer knowledge from one situation to another. Here, knowledge can be in the form of information.
There are two distinct processes: One is transferring the fuzzy concepts along with their inter relationship, and another is the adaptation of the fuzzy concepts. In the first process, source data are used to develop fuzzy inference system (FIS), which consists of fuzzy sets and fuzzy rules. FIS captures the knowledge from the source and is used to transfer it to the target task. The second process is the adaptation of the FIS. The adaptation process uses the knowledge learned from previous information to the unlabeled task data set. This process adapts the individual components of the FIS to capture the variations in the data. Alterations and variations from situation to situation are absorbed through the changes made within the domains of the fuzzy sets and adaptations of the rulebase.
• Fuzzy framework for producing output 1. In FuzzyTL, a source domain D s can be defined as where x 1 , x 2 ∈ X are inputs, y ∈ Y is an output, and N is the number of data tuples in the source domain. The domains of the each input and output are the intervals, i.e., the universe of discourse of that particular domain. Based on this definition, a source domain can be defined using the interval denoted as A domain with fuzzy sets of two inputs and a output is defined as where f X s 1 and f X s 2 are fuzzy inputs and f Y s is the fuzzy output. The universe of discourse of each input and output is partitioned into equal number of partitions or unequal number of partitions as per the requirement. Membership function of the fuzzy sets can be Gaussian, Triangular, etc.

Fuzzy rulebase
Rulebase containing two antecedent and one consequent sets can be constructed as: where X 1 and X 2 are the inputs, Y is the output, and P is the number of rules. This rulebase is called exhaustive rulebase when P = N . If N is large, then the number of rules is also large. So, the rulebase is reduced by the following mechanism proposed by Shell and Coupland (2015), which will be discussed in next step Transferring fuzzy concepts.
After reducing the rulebase, fuzzy sets are defuzzified to get the output of the target domain by using the method proposed by Wang and Mendel (1992), given in Appendix.

Transferring fuzzy concepts
This is the first stage of generating the FIS (fuzzy inference system) of FuzzyTL. This method uses numerical data of labeled data, i.e., the pair of input and output to produce the sets and rules which are previously discussed. A rule reduction method is used to reduce the repetition of rules and the impact of anomalous data and increases the use of information mostly supported by the numerical data. Fuzzy frequency measure (Shell and Coupland 2015) is used to form the reduced rulebase. This concept extends the Wang-Mendel method.
In standard Wang-Mendel method (Wang and Mendel 1992), the membership values of each data instance are used to decide the strength of a rule. In Shell and Coupland (2015), the FuzzyTL framework uses a frequency value for each rule according to the number of the repetition of their antecedents to capture more information from the exhaustive rulebase. FuzzyTL uses the following fuzzy measure function for each rule: where F r is the frequency of each rule which is the input to the function, c is the value F max that denotes the quantity of highest occurring rule, σ is {(F max − F min )×0.5}, and F min denotes the quantity of lowest occurring rules.

• Adaptation of the fuzzy concepts
The adaptation of the fuzzy concept consists of the following five stages:

External input domain adjustment
Input domains of the source are adapted according to the target task when a knowledge gap occurs due to the target inputs outside the respective intervals. It is adapted if it is needed as an input domain of the target inputs. The input intervals are adapted if the value extends beyond the left or right boundaries. The boundaries are adjusted based on the data from target domain. Each input is compared with the respective interval; if it is less than the left boundary, then the left boundary is decreased to that input value; if it is greater than the right boundary, then right boundary is increased to that input value.

Internal input domain adjustment
This work also focuses on input domain adaptation when the knowledge gap occurs due to fully or partially overlapped intervals of source and target tasks. It also works on input domains. Source input domains are transferred to adapt the knowledge gap according to target the input domains. The whole process is described below in few steps.
Step 1 The target input intervals are compared to the source input intervals [x l , x r ].
Step 2 The adaptation procedure uses the local minimum and maximum of the target values to compare with the source values.
Step 3 If one or both of these values lie within the interval that is represented by the source values x l and x r , a proximity measure (Shell and Coupland 2015) is calculated to decide whether the domain is able to adapt or not. The measure taken is assigning a membership function based on the source input domain interval.
Step 4 Here, in this procedure, a threshold value is considered above which the membership function of those intervals is adapted. Otherwise, the local minimum and maximum are accepted as lower and upper end points of target values.

Output domain adaptation
The adaptation of the output generated from the framework is itself based upon the target domain. Gradient control (Shell and Coupland 2015) mechanism is used to adapt the knowledge gap in target consequent sets. There are few steps described below.
Step 1 m sized sliding window data of source and target domain are collected for each input x ∈ X and output y ∈ Y . The output values for target domain are generated beforehand by the framework of the fuzzy set itself using Wang and Mendel (1992) defuzzification procedure.
Step 2 Gradients are calculated for each input and output of source data set, taking the mean and standard deviation of those elements within n window size. The gradients are obtained by normalization based on the standard score method defined as where z is the gradient of input or output particular x,x is the mean, and σ is the standard deviation of the sliding window.
Step 3 Gradients are compared with each other at each individual input value.
Step 4 The differences within the source and target gradients show the knowledge gaps which are required to adapt for improvising the unlabeled data. The consequent adaptation can be expressed as where d D a denotes the delta to adapt the consequent sets, φ is a learning parameter that can be user defined (Shell and Coupland 2015), and g s and g t are the gradients of source and target data for sliding window size n, respectively. Positive differences between source and target output gradients produce a reduction in the domain, whereas negative differences initiate the enlargement of the domain.
The previous three stages consider the domain adaptation of the fuzzy sets. The following stages describe the adaptation of the fuzzy rulebase. 4. Rulebase modification according to the adaptation rulebase Exhaustive rulebase is used to produce an adaptive rulebase. Reduced rulebase and adaptive rulebase are compared to get the final rulebase. Reduced rules are examined and applied to the target domain data to check their applicability. The exhaustive rulebase is evaluated iteratively to find the most used rules with greater weighting, which shows greater applicability of those rules within the target domain. The steps are listed given below.
Step 1 Examine the exhaustive rulebase to identify the rules those fire using the target input data.
Step 2 Rules those fire with the highest membership value from each data point are kept as adaptive rulebase.
Step 3 The adaptive rulebase is compared to the reduced rulebase. The better is kept in adaptive rulebase. If any rule is in reduced rulebase but not in adaptive rulebase, then that rule is added in the adaptive rulebase. 5. Rule adaptation using Euclidean distance measure The knowledge gap is not fully covered by previously learned information, and it needs new information to remove incompleteness. To do so, the formation of antecedent sets based on domain adaptation, described in previous stages, is required to move input domains of the target toward the true state. In the case of the consequent set, the Euclidean distance based on the source input values against the target input values is evaluated. The source output corresponding to the closest source input values represents the target output of the corresponding target inputs, for which Euclidean distance measure is calculated.

Proposed approach
Transfer learning (TL) can transfer knowledge in the form of information gained through past experience to utilize the same in the similar or sort of similar domains of problem. In this paper, FuzzyTL is applied with modification to study the stock market data where we use a set of labeled data to acquire knowledge and the gained knowledge is used on unlabeled data prediction. Stock market data have basic indexes high price, low price, opening price, closing price, volume, etc., and among them, high and low price represents input pair, whereas closing price represents output. In case of labeled data, the mentioned inputs and output are known. As example, 241 data points are used as labeled data and 84 data points are used as unlabeled data of the BSE stock market where inputs high and low price is known, but it is assumed that output closing prices are unknown. Another important thing is that as current time point data are highly dependent on just the previous historical data points, we have used here adaptive forecasting. Adaptive forecasting is a smoothing technique with just the previous history along with a smoothing factor. Labeled data set is fuzzified to form the framework of FuzzyTL which is previously mentioned. Then, n number of rules in an exhaustive rulebase are formed based on maximum membership in their respective input or output region where n number data points are considered in labeled data. To reduce the exhaustive rulebase, the method in section Transferring fuzzy concepts is used. Then, the Wang-Mendel defuzzification method is used to get the output for unlabeled data set.
To transfer the knowledge present in the domain as well as in rulebase, all the steps like External input domain adjustment, Internal input domain adjustment, Output domain adaptation according to the adaptation rulebase, discussed previously, are utilized. To transfer knowledge from the rulebase, we have followed separate steps, which are not similar to the fourth and fifth stages described in previous section.

Adapted rule reduction
In this section, we describe the modification of FuzzyTL for our approach. The output values for the target domain are generated beforehand by the fuzzy sets framework itself using (Wang and Mendel 1992) defuzzification procedure. Now, using the target inputs we find the universe of discourse of the target output and we partition the universe into some region. Gaussian memberships are calculated in each region for each target output. Maximum membership gives the inclusion of each target value in a particular region and the rules are generated. There are some repetitions in the rules. So, rules with the same antecedent and consequent fuzzy sets form a group and are represented by one rule. Using those combined rules and Wang-Mendel defuzzification method, we get the final target output values.
Another methodology using smoothing factor on just previous output, the current output is calculated. This is called adaptive forecasting. The equation is given below.
where F t is the predicted value at tth time point, O t−1 represents closing price value at (t − 1)th time point as it is output in our work, α is the smoothing factor where 0 < α < 1, and defuzzF t is the defuzzified value of tth data point.

Data sets descriptions
Stock market data are multi-attribute in nature, and stock market prediction depends on several factors, so it is quite difficult to build forecasting models for itself. As there are several factors for stock exchange ups and downs, it is nonlinear in pattern. In our work, basic indices of stock exchange time series data such as high price, low price, and closing price are used to build up the models. Time series data from three different stock exchanges are collected as labeled and unlabeled data sets.
• Bombay stock exchange (BSE) BSE's popular equity index-the S&P BSE SENSEXis India's mostly tracked stock market benchmark index.
There are several types of work, i.e., stock market analysis, forecasting models on BSE data (Gangwar and Kumar 2014;Joshi and Kumar 2012). In our work, labeled data sets are collected daily from January to December 2014 (241 data points) and unlabeled data sets are collected daily from January to April 2015 (84 data points) [24].

• New York stock exchange (NYSE)
NYSE is American stock exchange, hugely analyzed in several research works (Leigh et al. 2002;Granger 1992). In our work, labeled data set collected daily from January to December 2014 (252 data points) and unlabeled data set collected daily from January to April 2015 (82 data points) are used [25].

• Taiwan stock exchange corporation (TAIEX)
TAIEX is Taiwan stock exchange benchmark index. It is also hugely used in several analyses (Caia et al. 2013;Teoh et al. 2009). In our work, labeled data set collected daily from January to December 2014 (247 data points) and unlabeled data set collected daily from January to April 2015 (81 data points) are used [26].

Result and discussion
Stock indices of high and low price are used as the pair of input and closing price as output. A number of labeled data and unlabeled data are described in Sect. 5 for each BSE, NYSE, and TAIEX time series.
Here, each input and output data set is partitioned into 10 overlapping regions, which are represented by 10 linguistic variables. The maximum and minimum are used to decide  Table 1 shows the maximum and minimum for high price (input1), low price (input2), and closing price (output) of BSE. UOD is [19000,29000] for each input1, input2, and output. Gaussian membership is calculated for each data point in each region. Based on the maximum value, linguistic is assigned to the respective data point. Then, an exhaustive rulebase is obtained.
To reduce repeating the rules, the method proposed by Shell and Coupland (2015) is followed and we get the number of groups and the members in each group which is shown in Table 2. Table 3 depicts the reduced rulebase. Input domains are adopted using the first three stages of adaptation of the fuzzy concepts method described in Sect. 3.
The time complexity of an algorithm depends on the execution time of each statement of the code. In FuzzyTL code, as well as the Wang-Mendel code, for time series prediction, we have used a block of codes where the exhaustive rulebase is generated for source domain. In case of transfer learning, the number of unlabeled data is less than the number of labeled data. Labeled data are considered as source domain and unlabeled data as target domain. In that block of codes, we need to use the for-loop N 2 -times, where N is the number of source data points. So, time complexity of the Wang-Mendel code is O(N 2 ). But, FuzzyTL takes little bit more time in comparison with Wang-Mendel method as the same block of code is used for target domain. If M is the number of target data points, then time complexity of

Case study on Shenzhen stock prices
Economic conditions in China hugely depend on stock price movements. Shenzhen stock exchange is one of the three stock exchanges in China. There are several existing works on Shenzhen stock exchange (Zhou et al. 2015;Lin and Yang 2009). Opening, high, low, and closing prices of 30 stocks are collected and analyzed. There are 573 data points of each stock from 11 April 2004 to 3 December 2015. Among these long-period time series data points, we have considered 80% as labeled data and remaining as unlabeled data. Deciding Table 3 Reduced rulebase of labeled BSE data If x 1 is l 3 and x 2 is l 3 , then y is l 3 If x 1 is l 3 and x 2 is l 2 , then y is l 2 If x 1 is l 2 and x 2 is l 2 , then y is l 2 If x 1 is l 3 and x 2 is l 2 , then y is l 3 If x 1 is l 2 and x 2 is l 1 , then y is l 2 If x 1 is l 4 and x 2 is l 3 , then y is l 3 If x 1 is l 4 and x 2 is l 3 , then y is l 4 If x 1 is l 4 and x 2 is l 4 , then y is l 4 If x 1 is l 5 and x 2 is l 4 , then y is l 4 If x 1 is l 5 and x 2 is l 5 , then y is l 5 If x 1 is l 6 and x 2 is l 5 , then y is l 5 If x 1 is l 7 and x 2 is l 5 , then y is l 6 If x 1 is l 6 and x 2 is l 6 , then y is l 6 If x 1 is l 7 and x 2 is l 6 , then y is l 6 If x 1 is l 7 and x 2 is l 6 , then y is l 7 If x 1 is l 7 and x 2 is l 7 , then y is l 7 If x 1 is l 8 and x 2 is l 7 , then y is l 8 If x 1 is l 8 and x 2 is l 7 , then y is l 7 If x 1 is l 8 and x 2 is l 8 , then y is l 8 If x 1 is l 9 and x 2 is l 8 , then y is l 9 If x 1 is l 9 and x 2 is l 9 , then y is l 9 If x 1 is l 9 and x 2 is l 8 , then y is l 8 If x 1 is l 10 and x 2 is l 9 , then y is l 9 If x 1 is l 10 and x 2 is l 9 , then y is l 10 If x 1 is l 10 and x 2 is l 10 , then y is l 10  , 27, 56, 57, 58, 59, 62, 63, 64, 65, 76, 78, 79 3 3 , 4 4 5, 6, 7, 8, 60, 61, 77, 80, 81, 82 5 9 6 1 0 7 13, 14, 24, 30, 40, 70 8 15, 16, 21, 22, 23, 36, 37, 38, 39, 46, 50, 71, 72 9 17, 31, 32, 33, 34, 35, 41, 42, 44, 45 10 18, 19, 20, 43 11 25, 26, 28, 29, 47, 48, 49, 51, 52, 53, 54, 55, 66, 67, 68, 69, 73, 74, 75 12 83 the domains or UODs of 20% target data points is difficult in case of some Shenzhen stock prices because the time series have sudden ups and downs. The domains are adapted using  (Wang and Mendel 1992) after generating the set of reduced rules from labeled data. Some of the stock's original and predicted movements on unlabeled data are shown in Fig. 4. RMSE comparison between Wang-Mendel method (Wang and Mendel 1992) and FuzzyTL is shown in Table 6. It is observed that the forecasting model by using FuzzyTL performs better in most of the cases in comparison with the model by Wang and Mendel (1992). Bold face fonts show the better results in Table 6.

Statistical significant test
In this subsection, we elaborate the nonparametric statistical test Wilcoxon signed-ranks test to test the significant difference between two methods: (1) WM method and (2) FuzzyTL in time series forecasting. This statistical test examines the null hypothesis that both the methodologies perform equally well. Data sets on 29 stocks of the Shenzhen stock exchange are used to observe the performance of the two mentioned methodologies. RMSEs are tabulated in Table 6 for the actual and predicted values using two methodologies. In Table 6, differences, absolute differences, and ranks based on differences are also tabulated. Exact critical value, say T , is calculated as min(R + , R − ), where R + is the sum of the ranks for the data sets on which second methodology outperformed the first, and R − is the sum of the ranks for the others. For larger number of data sets, if the statistics is smaller than −1.96 with confidence level α = 0.05, then null hypothesis can be rejected where N is the number of data sets. Here, in our case, T = min(R + , R − ) = min(401, 34) = 34, N = 29, z = −3.96785. So, we can conclude that null hypothesis can be rejected and the two methodologies have significant difference.

Summarization and limitation
Our motivation for this work is to implement fuzzy transfer learning concepts in time series prediction. For domain transfer concept, we have used multi-attributed time series.
To adapt the transferred knowledge, domain of discourses is adjusted and then rulebase has been generated. The proposed approach has been implemented on stock market time series, and result comparisons have been provided considering traditional fuzzy time series models. So, main advantage of this work is to implement transfer learning in time series prediction. There are two main steps in FuzzyTL: domain adjustment and rulebase generation. For our model building, we have considered the unlabeled data as labeled data. In creating an adapted rulebase, we have used Wang and Mendel (1992) defuzzification procedure as we are dealing with unlabeled   data. If we apply the FuzzyTL in predicting proper labeled data, classification algorithms can be implemented to get the predicting attributes. Using proper labeled data and deciding predicted attributes for reduced rulebase have not been considered in our work, which may be considered as limitations of this work.

Conclusion
A novel FuzzyTL for time series forecasting has been proposed in this paper. The concept of learning from labeled data and transferring the knowledge (or experience) to make a decision on unlabeled data is applied to some extent. Proposed approach has advantage as well as some limitations due to the use of unlabeled data. It was observed in previous pieces of work that the current time point value is highly dependent on just the previous or some of the previous time point values on time series analysis. So, here we have implemented this observation as it is true in case of stock market data. Wang-Mendel method (Wang and Mendel 1992) used this characteristic of time series analysis to predict Mackey-Glass time series. Here, we have considered the same concept on stock price time series prediction. It is observed that FuzzyTL with smoothing and without smoothing works better in comparison with Wang-Mendel method (Wang and Mendel 1992). Also, it is performing well in comparison with some traditional fuzzy time series models, where transfer learning concept is not considered. In future, we can apply all the five stages (discussed in Sect. 3) of TL adaptation for better performance. Also, rule reduction can be improved by using rough set theory.