A Tensor Based Stacked Fuzzy Networks for E cient Data Regression

Random vector functional link and extreme learning machine have been extended by the type2 fuzzy sets with vector stacked methods, this extension leads to a new way to use tensor to construct learning structure for the type-2 fuzzy sets-based learning framework. In this paper, type-2 fuzzy sets-based random vector functional link, type-2 fuzzy sets-based extreme learning machine and Tikhonov-regularized extreme learning machine are fused into one network, a tensor way of stacking data is used to incorporate the nonlinear mappings when using type-2 fuzzy sets. In this way, the network could learning the sub-structure by three sub-structures’ algorithms, which are merged into one tensor structure via the type-2 fuzzy mapping results. To the stacked single fuzzy neural network, the consequent part parameters learning are implemented by unfolding tensor-based matrix regression. The newly proposed stacked single fuzzy neural network shows a new way to design the hybrid fuzzy neural network with the higher order fuzzy sets and higher order data structure. The effective of the proposed stacked single fuzzy neural network are verified by the classical testing benchmarks and several statistical testing methods.


Introduction
The random vector functional link (RVFL) and extreme learning machine (ELM) are two popular randomized single layer forward learning networks, which provide us a unified framework for both regression and multi-class classification with single layer. Then the semi-supervised RVFL and ELM networks can be merged into a joint optimization framework, it shows that the algorithm is efficient in moderate scale data classification (Peng et al, 2020). The parameters could be regularized when ridge regression is used (Yildirim and RevanÖzkale, 2019). When singular value decomposition (SVD) is used for algorithm iterative solution searching, the SVD update algorithm scales better and works faster than SVD computed from scratch (Grigorievskiy et al, 2016). Multi-label learning method could also use the multi-label radial basis function neural network and Laplacian ELM (Xu et al, 2019), in this algorithm, the clustering algorithm determines the number of hidden nodes, and the center of the activation function could be determined by the data itself, then the output is solved by a Laplacian ELM. Inspired by biological intelligent systems, bio-inspired learning model blooms a lot recently (Huang and Chen, 2016;Alencar et al, 2016;Christou et al, 2019), its can be applied to many areas, such as, anomalous trajectory classification (Sekh et al, 2020), long-term time series prediction (Grigorievskiy et al, 2014), T-S fuzzy model identification (Wei et al, 2020), dictionary learning-based image classification (Zeng et al, 2020a), anomaly detection (Hashmi and Ahmad, 2019), HRV recognition (Bugnon et al, 2020), energy system (Yaw et al, 2020), mislabeled samples detection (Akusok et al, 2015), concept drift detection (Yang et al, 2020c), etc.
Generalization performance is the main concern for the learning algorithms, balancing computational complexity and generalization ability have been extended via ELM (Ragusa et al, 2020); with the designed data and modeled parallel ELMs, large-scale learning tasks could be tackled by ELM (Ming et al, 2018), moreover, a tradeoff should be made among efficiency and scalability, the algorithm should have complementary advantages. With the aid of graph learning and adaptive unsupervised/semi-supervised clustering method, flexible and discriminative data embedding could be achieved (Zeng et al, 2020b;Zheng et al, 2020). By using the regularized correntropy criterion and half-quadratic optimization technique, convergence speed and performance are both showed superiorities than the original (Yang et al, 2020a), and the robust type algorithm has been studied in (Yang et al, 2020b). When inversefree recursive algorithm is used to update the inverse of the networks' Hermitian matrix, efficient inverse-free algorithm is designed to update the regularized pseudo-inverse has been proposed for ELMs (Zhu and Wu, 2020).
When big data environment is encountered, a fast parameter selection scheme for modeling the large amount of data is needed, alternating direction method and maximally splitting method could be applied to the algorithms to minus the number of the sub-model and coefficients training (Lai et al, 2020). Concerning the credit probability for network output for each prediction, probabilistic output from the original architecture of ELM is proposed, iterative way of learning is eliminated, and the merits of ELM is preserved . Using Bayesian inferences, multipleinstance learning-based ELM has proven to be efficient in classification problems (Wang et al, 2020). Optimally pruned ELM (Miche et al, 2010) is presented to both regression and classification problems, the proposed algorithm could counter the effect of noise. To move forward, a L 2 regularization penalty applied to the optimally pruned ELM, and a double-regularized ELM using LARS and Tikhonov regularization is proposed (Miche et al, 2011b). Missing data case for the regression problem is studied in (Yu et al, 2013). When training sample selection method is designed based on the fuzzy C-means clustering algorithm, and the proposed small training samples selection-based hierarchical ELM could reduce the computational time (Xu et al, 2020).
Random vector functional link networks (RVFL) (Zhang and P.N.Suganthan, 2016) could also use the techniques mentioned above, such as ridge regression (Zhang and Suganthan, 2017), and its extended version, the new learning paradigm is named as RVFL plus (RVFL+) (Zhang, 2020), it has been used in neuro-imagingbased parkinson's disease diagnosis (Xue et al, 2018;Shi et al, 2019). Motivated by the ELMs and RVFLs, generalized Moore-Penrose inverse and triangular type-2 fuzzy sets are used to extend the ELM, and tensor-based ELM has been proposed (Huang et al, 2019). RVFL network has been also expanded to tensor case (Zhao and Wu, 2019), the type-reduction method for general type-2 fuzzy sets are removed in this type of network.
Motivated by the above-mentioned material, we have noticed that the type-2 fuzzy sets and tensor structure provide a new way to model the complex data, whether the ELMs, RVFLs or the neuro-fuzzy systems could be used under the tensor structure. Our target is to unveil the links or laws behind the data, and a new tensor-based stacked networks for efficient data regression is studied. To get the merits of the algorithms, a good way is to fuse the algorithms into one frame, the balance of performance and structure simplicity would be achieved.
To fuse different concept and techniques into the algorithm, it is inevitable to extract the different aspect of the data, then different view results of the data can be obtained, then the original proposed algorithms could be used to minimize the testing error. Go back to the type-2 fuzzy sets, it could map the data with different parameterspecified fuzzy membership functions with at least three type-1 fuzzy membership functions, then the multi-view on the data could be obtained. A question follows this is how to fuse the results into one data structure, tensor is the suitable choice for these types of learning methods, this is the motivation of the work.
The structure of the rest of this article is as follows: Section 2 introduces the three algorithms that is used by stacking system, Section 2.1 introduces the tensor-based type-2 RVFL, Section 2.2 presents the tensor-based type-2 ELM, and Section 2.3 is the introduction to the TROP-ELM, it is used to compare algorithms' performance. In Section 3, the structure of tensor based hybrid single fuzzy neural networks, that is, a stacked single fuzzy neural network is presented. Simulation results and discussions are given in Section 4. Finally, conclusions are inferred in Section 5.

Preliminary
In this station, the tensor-based type-2 RVFL, tensor based type-2 ELM and Tikhonov regularized OP-ELM is introduced.

Tensor-based type-2 RVFL
The RVFL usually adopts the activation functions to construct the network, for example the Radbas (y = e −s 2 ) functions. In the hidden layers of the network, where y and s are defined as the outputs and inputs, respectively. The enhancement nodes of tensor-based RVFL are replaced with IT2 fuzzy sets. The structure of tensorbased type-2 RVFL (TT2-RVFL) is represented in Fig. 1. Activation function Radbas of RVFL is extended to interval type-2 fuzzy set IT2Radbas, and the extended RVFL is constructed using IT2Radbas. The Radbas activation function in RVFL is extended to interval type-2 fuzzy sets IT2Radbas, and IT2Radbas is used to construct IT2RVFL.  The membership function (MF) of type-2 fuzzy sets in TT2-RVFL is defined as follows

Input Layer
Springer Nature 2021 L A T E X template Given testing data set {D t } N t=1 , where D t = (x t , y t ), x t ∈ R N and x t = (x t1 , x t2 , · · · , x tN ), y t ∈ R and y = [y 1 , · · ·, y N ] T . For a lower MF matrix Φ ∈ R N ×L×1 can be structured with the following R N ×L matrices as follows where b il is bias and a il = [w i1 , w i2 , · · · , w iK ] (i = 1, 2, · · · , L; l = 1, 2) is input weights, respectively; they are randomly generated. By the definition of the IT2 fuzzy sets' lower MF, the relationship between input x t and expected output y t could be approximated by the lower MF matrix. The principal MF matrixΦ ∈ R N ×L×1 and the upper MF matrixΦ ∈ R N ×L×1 can also be structured similarly, with the following formulaŝ Φ :,1,2 =   ĝ 1 (a 11 x 1 + b 11 )ĝ 1 (a 12 x 1 + b 12 ) . . . . . .
It also forms the upper membership function value-filled tensor, the slices of tensor arē where a ij (i = 1, 2 · · · , L; j = 1, 2) is a random generated weighted vector constructed by tensor.
Remark 1 The uncertain weight method (Runkler et al, 2018) is applied to compute the principal MF matricesΦ, which reflects the impact of MF value on the whole defuzzification result of the set. The mapping results are obtained as follows: where ζ > 0 measures the uncertainty of lower membership value on type reduction results through the formula (1 + u(x) −ū(x)) ζ . The uncertainty weight method expands the simple method to obtain the mean value of upper and lower MF value. The legible output can be given as follows: Formula (4) shows that for ζ = 1, the defuzzification results increase linearly with uncertainty; the defuzzification results are less than linear when ζ < 1; the defuzzification results would be greater than linear when ζ > 1.
Remark 2 For the expansion of fuzzy sets from type-1 fuzzy MFs to IT2 fuzzy set, when (4) is used for defuzzification, the extension of IT2 fuzzy set to the enhanced part is called IT2-RVFL (interval type-2 random vector function link).
Finally, a 3-tensor Φ ∈ R N ×L×3 is established by the three foregoing membership function Φ :,:,1 , Φ :,:,2 andΦ :,:,3 . Thereinafter, because of the relevant usage of A in tensor equations, Φ will be changed to another capital letter A in the next section. It can be known from the relevant content of the tensor equation that in the enhanced node of TT2-RVFL, the weighting matrix is a matrix of L × 3, so the weighting matrix of TT2-RVFL can be defined as To TT2-RVFL, the output model can be fused into one matrix by the following equation where α ∈ [0, 1] is called the equilibrium coefficient of TT2-RVFL, A 1 is denoted as mapping consequences of non-linear interval type-2 activation function, X 1 is the weighted matrices for the enhanced part, matrix Y 1 = [y y]; define a i (i = 1, 2 · · · , L) as a weight vector from input layer to the intensification nodes are stochastically generated, in such a way, the activation functions in Φ, Φ are not fully saturated; X 1 is the input matrix that is structured by input samples, and Ω 1 = [ω ω] is used to denote the unresolved input weight matrix.

Tensor based type-2 ELM
The TT-2ELM (tensor based type-2 extreme learning machine) was first proposed in (Huang et al, 2019). The advantage of tensor structure is that the information of quadratic MF can be contained and modelled directly into one high dimensional array. This characteristic can avoid the type reduction operation in the process of type-2 fuzzy reasoning. Therefore, the tensor structure can seamlessly embed type-2 fuzzy sets into ELM. Fig. 3 intuitively shows the structure of TT2-ELM. Obviously, TT2-ELM is a single hidden layer feed forward neural network. For the specified test dataset, which is expressed as (x t , y t ), where x t = (x t1 , x t2 , · · · , x tK ) ∈ R K represents inputs, and y t ∈ R represents outputs. The mathematical model of TT2-ELM is formed as where A 2 ∈ R I1···2×J1···2 can be reshaped by Φ with a specified size N × L × 3, the N × L × 3 is regression tensor's dimension, N is the training patterns, X 2 ∈ R J1···2 is the value of output weight, Y 2 ∈ R I1···2 is the output matrix, Obviously, when N = 2, equation (7) degenerates into matrices case. According to (Huang et al, 2019) (Theorem 1), for multi-linear system (7), if there exist any X 2 ∈ R J 1···N , then the multi-linear system (7) is solvable, and the solution of the equation (7) is For the equation (7), if no X 2 ∈ R J 1···N can be obtained via the analysis, then the multi-linear system (7) is unsolvable, and the equation (7) has a minimum norm solution alternatively.
The following tensor equation can be obtained.
The gain tensor A T 2 * N A 2 in equation (9) can be considered as 'square tensor', and the solution of tensor equation can still be obtained. Based of theory of tensor and kernel, the following equation can be determined.
where Z is an random tensor that can satisfy formula (10). As long as this tensor has a suitable order, it can be established. A generally accepted condition is ( Therefore, A general solution can be obtained as following represents the 1-inverse of A 2 . The minimum solution for the tensor equation consists of two parts, one of which comes from the null space of equation A 2 * N X 2 = 0. Note that the tensor equation that is formulated by (7) and (11), we have For equation (12), if Z = 0, X 2 is the minimizer. In the light of Corollary 2.14(1) (Behera and Mishra, 2017), the existence of (A T 2 * N A 2 ) (1) makes Subsequently, the least squares solution of equation (13) is shown below Equation (14) is the minimum norm solution of equation (7).

Tikhonov Regularized OP-ELM
In (Miche et al, 2010), Yoan Miche et al. first proposed OP-ELM, it is an improvement of ELM. Fig. 4 shows the Tikhonov regularized ELM (TROP-ELM) (Miche et al, 2011a). First, the OP-ELM network uses three different types of activation functions to form the kernel, and the assembled kernel is able to improve robustness and generality. For the initial condition of the ELM algorithm, Sigmoid kernel is used in its structure, while OP-ELM could use linear kernel, Sigmoid kernel and Gaussian kernel. Second, compared with the original proposed ELM, the multiresponse sparse regression algorithm (MRSR) and the verification method leave one out (LOO) are also introduces in OP-ELM. The main role is to prune irrelevant variables by pruning the related neurons of SLFN constructed by ELM. The MRSR algorithm can rank neurons according to the usefulness of neurons, and the actual pruning technique is performed by leave one out validation results. In the previous section, three single fuzzy neural networks, that are, the TT2-ELM, TT2-RVFL and TROP-ELM are briefed. In this section, the three networks are stacked into one network, and a regression method that is based on the three regression results are introduced. The architecture is shown in Fig. 5.
The framework of tensor-based stacked single fuzzy neural network that is design based on three single fuzzy neural networks (TT2-ELM, TT2-RVFL and TROP-ELM), while the three algorithms have been proposed by researchers. The layer 2 is the hidden layer that is constituted by the three algorithms to construct a tensor structure. The tensor structure could be constructed in layer 3, then the tensor is unfolded into three different matrices for each single network. The final regression uses a simple normalized scalar weighting, which is the part to be optimized in the future.
The unfolding of tensor could use the definitions from (Yu et al, 2019), which are appended as definition 1.
Definition 1 (m, ν)-unfolding. Consider an tensor with N -dimensional, and the tensor follows the Jm 1 × · · · × Jm N −n such that the (j 1 , j 2 , . . . , j N ) -th entry of A is the entry of A m,ν at the position is the linear index of a multi-dimensional array, see, e.g., (Baranyi, 2016) and (Baranyi et al, 2014).
Remark 4 By comparing the structures of TT2-RVFL, TT2-ELM and TROP-ELM, we have made a few modifications to TT2-RVFL and TT2-ELM. The original activation function of TT2-RVFL and TT2-ELM is Sigmoid function, while TROP-ELM uses linear activation function and Gaussian activation function in addition to Sigmoid function. In order to help ensure that their structures correspondingly and facilitate the composition of tensor structures, linear activation function and Gaussian activation function are also added to TT2-RVFL and TT2-ELM. After adding two distinct activation functions, TT2-ELM and TT2-RVFL obtained better performance than the original proposed ones based on our test.
The 3-tensor A ∈ R N ×L×3 generates three matrices, and the row number of matrices is just equal to sample number. Three matrices are denoted by N 1 , N 2 and N 3 , respectively, and N k ∈ R N ×L , k = 1, 2, 3. The three matrices, N 1 , N 2 and N 3 could reconstruct the tensor A ∈ R N ×L×3 easily, that is, N 1 is the first aspect of A(:, :, 1), which is the mapping results from the TT2-RVFL. N 2 is the second aspect of A(:, :, 2), which is the mapping results from the TT2-ELM. and N 3 is the three aspect of A(:, :, 3), which is the mapping results from the TROP-ELM.
To the type-2 fuzzy networks, the LMF, UMF and defuzzification of the secondary membership function are used to solve the consequent parameters' learning problem. To the regression layer, N 1 , N 2 and N 3 are three matrices that is unfolded from the tensor, regression equation can be denoted as To make the network perform best, first, the error of each network is calculated, in addition, root mean square error is used as the measurement standard. Second, the best-performing network are found out, which is used to train the other two network errors. After processing, the network is recorded as y ′ t . Finality, The output are averaged based on y ′ t , that iŝ which is the average results of the three type-2 fuzzy networks, the MF values are obtained from lower MF and upper MF, since the matrices are the unfolded results from the tensor. And the defuzzificaiton result could be calculated from secondary MFs, it could also be used for the stacked SFNN. The unconstraint regression result of A k β k = y t could be denoted aŝ In statistics, this method is known as ridge regression, it is associated to the Levenberg-Marquardt algorithm and Andrey Tikhonov method to solve the regularization of ill-posed non-linear least-squares problems. Suppose that for a known matrix N k and vector y t , a vector x is expected to be found, such that In most of the cases, ordinary least squares estimation leads to an overdetermined (overfitted), or more often an underdetermined (underfitted) system of equations. Therefore, in solving the ill-posed inverse-problem, the inverse mapping operators that has the undesirable tendency of amplifying noise (The eigenvalue value is maximum in the reverse mapping and the singular value is minimum in the forward mapping). Moreover, every element of the reconstructed version of x is implicitly nullified by ordinary least squares, instead of taking a model to as a prior of x. For the purpose of the minimize residuals sum of squares, and the particular solution also satisfies some suitable qualities, a regularization term which can be added to this primary minimization problem, it can be succinctly scripted as follows where · is the Euclidean norm, Γ k is a appropriately selected Tikhonov weighting matrix. Under many circumstances, the matrix Γ k is selected as a multiple of the character matrix α k I, By L 2 regularization, solutions with smaller norms can be found (Ng, 2004). At other times, in the event that the fundamental vectors are considered primarily uninterrupted, a low-pass operator can be accustomed to strengthen flatness. This canonicalization enhances the conditioning of the problem, which leads to a straightforward numerical solution. An approximated solution is signified throughx, which is presented by: the individual algorithm of the stacked single fuzzy neural network could use the regularized results for the tensor unfolded structure's learning method.

Simulation results for the datasets
In this section, the UCI benchmark dataset and the other four actual datasets are tested to evaluate the performance of this method. Among all the simulations, root mean square error (RMSE) was utilized to assess the performance of the TSFNN in this paper and the four comparison methods, which is given by whereŷ t is the predicted signal, y t is the target signal, and N is the length of the testing sequence. All experiments are performed on a computer with AMD Ryzen 7 4800U with Radeon Graphics 1.80 GHz and 16 GB RAM. The result of total 5000 times were performed on each data set.

Regression problems
In this section, ten realistic world regression problems are used for testing. Abalone is a dataset that is used to predict abalone's age by physical measurements, which includes the whole weight, shucked weight and viscera weight of abalone in Tasmania, and 4447 samples with 9 attributes is included in the dataset. Airfoil Self-Noise dataset is obtained from a series of aerodynamic and acoustic tests on two-dimensional, and threedimensional airfoil blade profiles in an anechoic wind tunnel, it contains 1503 samples and 6 attributes. The data set of Auto-MPG assembles miles each gallon data with dissimilar car brands, it contains 392 samples with 8 attributes. 6.23e-01 9.39e-03 6.15e-01 2.62e-02 TT2-RVFL 6.43e-01 1.18e-02 6.64e-01 2.30e-02 TT2-ELM 6.45e-01 8.69e-03 6.53e-01 2.04e-02 OP-ELM 6.38e-01 1.14e-02 3.25e+00 9.09e+01 TROP-ELM 6.27e-01 9.59e-03 1.09e+01 3.05e+02 The bank dataset simulates the customer's patience who select their favored services in the bank according to 8 factors, for example residential area, distance, virtual temperature regulating bank option and so on, it contains 8192 samples with 8 attributes. Concrete Slump dataset contains information about the factors that affect slump flow of concrete, it includes 103 samples with 11 attributes. Diabetes is a dataset that investigates the reliance of the grade of serum Cpeptide on various factors, it can be used to measure residual insulin secretion patterns, 768 samples with 4 attributes are included in the dataset. Delta ailerons and Delta elevators are recorded ailerons' data and elevators' data for delta, and they have 7129 samples with 6 attributes, and 9247 with 7 attributes, respectively. Energy efficiency is a dataset that is obtained by energy analysis of 12 various architectural shapes, which is simulated in Ecotect, and there are 768 samples and 8 features in it, the regression problem is for forecast 2 authentic valued responses that are cooling load and heating load. Wine quality white is a dataset associated with red and white wine samples, it contains 4898 samples with 12 attributes.
The information of the dataset is presented in Table 1, these datasets involve four smallscale datasets and six moderate-scale datasets. The mean and standard of 5000 experimental results of Abalone, Airfoil self-noise, Auto-Mpg, Bank, Concrete slump, Diabetes, Delta ailerons, Delta elevators, Energy efficiency and Wine quality white are showed in Table 2. TT2-RVFL, TT2-ELM, OP-ELM and TROP-ELM are used for algorithm comparison.
Results of Friedman test on these ten data sets for five method (TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM and TSFNN) are listed in Table 3. It can be inferred that the training error and testing error of the proposed method is the smallest. The stacked tensor-based hybrid single fuzzy neural networks of proposed can share advantages of TT2-RVFL, TT2-ELM and TROP-ELM, and the tensor quantization method of fuzzy system may be a method of extending type-2 fuzzy modeling.

Electrical Fault Detection and Classification Dataset
Power systems consist of many complex dynamic and interactive elements that are always vulnerable to interference or electrical failures. Transmission lines are the most critical part of the power system, and the prominent role of transmission lines is to transmit electricity from the source area to the distribution destination in the network. The faults of power system transmission lines should be first correctly detected and classified, and should be eliminated in the shortest possible time. Electrical Fault detection and classification dataset contains the current and voltage of the line under different fault conditions (Jamil et al, 2015). The dataset contains the detection of power system faults and classifying fault types for the power system faults. The dataset of power system fault detection contains 12001 sampling data with six inputs (I a , I b , I c , V a , V b , V, c) and two outputs (0 and 1), no fault is denoted by 0, and fault is denoted by 1. The dataset of classifying fault types contains 7861 group data with six inputs (I a , I b , I c , V a , V b , V, c) and four outputs (G, C, B and A), they represents 4 generators of 11 × 10 3 V, respectively. No fault occurs is denoted by 0, and fault occurs is denoted by 1. The combination of G, C, B and A represents various failures, and the failures faults are shown in Table 5.
The faults of the system are judged according to the current and voltage of the power system. The dataset consists of two outputs that represents whether the system faults. Fig. 6 shows that no faults in Power Electrical detection dataset. Similarly, Fig. 7 shows that faults occurs in Power Electrical detection dataset. The transverse axis delegates samples, and the longitudinal axis delegates the values of V a , V b , V c , I a , I b and I c . Compared with Fig. 6 and Fig. 7, if there is no fault in the electric power system, the values of current and voltage are generally stable, and its change trend is similar to the sin function, which is also consistent with the characteristics of AC in the electric power system. However, once a fault occurs in the power system, the values of current and voltage are abnormal, which represents different fault locations. This anomaly can be clearly seen from Fig. 7. The comparison result of detection of power system faults dataset that is observed to determine whether the power system is faulty is shown in Table 4.  Comparison results of TT2-RVFL, TT2-ELM, OP-ELM, TROP-ELM and TSFNN on the Electrical Fault detection dataset are used to test the performance of the five algorithms. Results of TSFNN in the Table 4 show that the generalization ability of TSFNN is better than the other four algorithms. Moreover, data indicats power system failure can be considered as no power system failure added noise by comparing with Fig. 6 and Fig.  7. Therefore, Table 4 also show that the disturbance reject ability of TSFNN is better than the other four algorithms. For Electrical Fault classification case, we decompose Electrical Fault classification dataset into five parts according to the different locations of faults. It can be seen from Table 5 that the dataset provides six kinds of faults, but we do not find LL fault ([G, C, B, A] = [0, 0, 1, 1]) in the dataset, which represents the fault between Phase A and Phase B.
Through the above analysis, we know that in the power system, the faults data can be regarded as the no faults data with added noise. Thus, in the five extracted datasets, the dataset of LG fault, LLG fault, LLL fault, and LLLG fault can be treated as imposing different noise on the dataset with no fault status. Moreover, the extracted dataset represents only one power system fault, so its data is more pure and has more obvious characteristics and trends. Through the above-mentioned analysis, the anti-interference performance and generalization performance of the algorithm can be further verified. Results of TSFNN in the Table 6 fully proof that the excellent disturbance rejection and generalization performance of TSFNN.

Asteroid Dataset
The Asteroid Dataset is officially maintained by Jet Propulsion Laboratory of California Institute of Technology, which is an organization under NASA. The data set is publicly available in JPL Small-Body Database Search Engine. This dataset could also be obtained from kaggle. Table 7 shows basic column definition for Asteroid dataset. A portion of the data is extract by us as a comparison test dataset when using Asteroid dataset. The 2500 samples in the dataset presents 7 attributes, these samples are applied to validate the proposed algorithm. These properties are Geometric albedo, Eccentricity, Semi-major axis, inclination angle about the x-y elliptic plane, Earth Minimum Orbit Intersection Distance and RMS for the Asteroid, respectively. The comparison results with five methods are demonstrated in Table 8. The results demonstrate that the TSFNN performs best with respect to training error, while TT2-RVFL and TT2-ELM performs best in testing error. Meanwhile, the performance of OP-ELM and TROP-ELM is bad. Because the approach of proposed in this paper is superposition of TT2-RVFL, TT2-ELM and TROP-ELM. The main reason why TSFNN, OP-ELM and TROP-ELM perform well in training and poor in testing is that these three methods all use multi-response sparse regression (MRSR), it is a variable sorting technology that is extended from s the least angle regression algorithm (Similä and Tikka, 2005) and (Efron et al, 2004). According to the usefulness of neurons, the MRSR algorithm can obtain a ranking of the neurons in OP-ELM (Miche et al, 2010). TROP-ELM is an improvement of OP-ELM, the MRSR method is also used for the input data of TROP-ELM. Due to the proposed TSFNN includes TROP-ELM, thus the TSFNN is affected by MSRS method.
To the MRSR, an important feature is that the obtained ordering is exact in the case of linear problems. The Asteroid dataset collects the attributes of asteroid. A part of its data is used, it is also nonlinear. And the OP-ELM and the TROP-ELM have one detail that the neural networks they constructed are linear between the hidden layer and the output layer, the role of MRSR algorithm is that will get an exact ranking of the neurons. The sequence obtained by sorting can be used to sort the kernels of model. When the whole dataset is nonliner, so the exact ranking of neurons cannot be obtained by OP-ELM. Similarly, TROP-ELM and TSFNN are also affected by this flaw. Therefore, TSFNN performs well in the training part, and in the test part, due to MRSR method, the extracted data set features cannot be well applied to the testing set, resulting in poor performance of TSFNN in the testing phase.
According to the data in Table 8, the performance of TT2-RVFL and TT2-ELM is the best. The TT2-RVFL and the TT2-ELM are constructed by tensor structure and interval type-2 fuzzy sets. The membership degree of type-2 fuzzy set is characterized by type-1 fuzzy set. Since the type-1 fuzzy set has a strong ability to deal with uncertainty in the system, so type-2 fuzzy set greatly strengthens the processing ability of fuzzy system for uncertainty and nonlinearity, and it has good performance in nonlinear systems with high uncertainty. Therefore, type-2 fuzzy systems have strong generalization ability. And the tensor structure is also good at dealing with uncertain systems, which can improve the generalization performance of the system. The merits of type-2 fuzzy sets, and the tensor structure are inherited by the tensor-based type-2 fuzzy system. On the basis of the above analysis, TT2-RVFL and TT2-ELM have good performance on Asteroid dataset, their training error and testing error are minimal in Table 8.
From the test performance of TSFNN, since TSFNN contains TT2-RVFL and TT2-ELM, it makes up for the defect of insufficient generalization ability of TROP-ELM in nonlinear system. This also proves the excellent generalization ability of type-2 fuzzy systems, the advantages of the stacked tensor-based hybrid single fuzzy neural networks indicate that the stacked way of networks designing can inherit the merits of the used algorithms, and the stacked structure of the three algorithms are complementary with each other.

Novel Corona Virus 2019 Dataset
COVID-19 affected cases in the Corona Virus 2019 dataset contains date information label. This dataset contains daily level information about the number of affected cases, the number of deaths and the rehabilitation of the new coronavirus in 2019. It is worth noting that this is a time series data, so any number of cases given a fixed date is cumulative. the data from national centers for disease control and prevention are collected in github. The data is updated daily. Eight cities in Beijing, Shanghai, Chongqing Tianjin, Arizona, Washington, California and Illinois are used for testing, and the time stamp ranges from 22 Jan, 2020 to 29 May, 2021.
There is no doubt that the extracted data of eight cities is composed of time series dataset, and is a small-scale dataset with three attributes. Four of the eight cities that are selected were from China, four were from the United States, the outbreaks in both regions were predicted. Although the data set itself is a small-scale one, because the attribute of the data set is only three, obviously, attribute is not sufficient. The performance of the proposed network is tested in the case that the feature attributes of the dataset are insufficient. Beijing TSFNN 5.92e+01 3.74e+00 6.01e+01 7.07e+00 TT2-RVFL 6.25e+01 2.82e+00 6.30e+01 6.67e+00 TT2-ELM 6.25e+01 2.82e+00 6.28e+01 6.63e+00 OP-ELM 7.59e+01 1.15e+01 7.64e+01 1.33e+01 TROP-ELM 7.60e+01 1.14e+01 7.65e+01 1.33e+01
Similarly, the same operation is performed on the four datasets of Arizona, Washington, California and Illinois, and the results are shown in Fig.  9. It can be regarded from Fig. 8 and Fig. 9 that, due to the characteristics of the COVID-19 dataset itself, the overall trend of the data is rising, reflecting the characteristics of its time series, and the growth rate of the curve reflects the situation of the new coronavirus. From the figures, these eight datasets are more suitable for regression problems, and the data is relatively stable. Therefore, there is little difference between TSFNN and four comparison methods in Table 9.
Compared with Fig. 8 and Fig. 9, the curve in Fig. 9 is smoother and the overall trend is more obvious. Although the curve in Fig. 8 shows an overall upward trend, the data soon go back to  the stable state. Therefore, the data in Fig. 9 is more suitable for forecasting regression problems than the data in Fig. 8, which is also the reason that in Table 9 the overall performance of the data sets corresponding to four cities in China on five algorithms is not as good as that of the data sets corresponding to four cities in the United States.
Through the above analysis, the data characteristics of the data set itself lead to data differences in table 9. Because of this difference, four data sets in China are comparable to four data sets in the United States. On the premise that the eight data sets are small-scale time series, the data characteristics of the four data sets in China are quite different, and the performance in the regression problem is poor, while the data in the United States are more obvious and more suitable for regression analysis.
In addition, four data sets such as Beijing can be regarded as data sets with more complex data structures than four data sets such as Arizona, and the features are more diverse, not limited to reflect the upward trend of data. Obviously, combined with the actual current situation. The four data sets related to the United States can better predict the future epidemic situation in the United States. As far as the data in the data set itself is concerned, the four data sets such as Arizona are more "pure" and suitable for regression problems. Therefore, the performance of the five methods on these four data sets is better, while the performance of Beijing and other four and data sets is relatively poor.
According to the overall data of Table 9, TSFNN still performs best compared with the other four comparison methods. In four datasets of Beijing, Shanghai, Tianjin and Chongqing, although the overall data is relatively poor due to the problems with the data itself, the performance of TSFNN is still the best in these four datasets. TSFNN also performs best in Arizona, Washington, California and Illinois datasets which has the better data than 4 datasets for China. This shows in the excellent feature extraction and generalization ability of TSFNN. It is demonstrated that the scheme of tensor stacked neural network can integrate the advantages of its members and enhance the ability of data feature extraction.

Wind Power Generation Dataset
The wind power generation or wind energy is the use of wind power to provide mechanical power through wind turbines and generate electricity by rotating generators. Wind power is a popular and sustainable renewable energy, and its impact on the environment is much smaller than burning fossil fuels. Wind power generation dataset is collected from four German transmission system operators which are 50Hertz, Amprion, TenneT TSO and TransnetBW, the date ranges from 23 August 2019 to 22 September 2020. It contains non-normalized power generation data with 15 minutes per interval, a total of 96 sets of data with 96 points a day. The measurement unit of these data is terawatts hours. Obviously, there are 397 samples with 96 attributes in Wind power generation dataset.
The Wind Power Generation dataset is a small and medium-sized dataset. Through the data set, the prediction ability of the model can be verified in the case of sufficient data set attributes. Meanwhile, 50Hertz, Amprion, TransnetBW and TenneT TSO manage Germany's east, south, west and north regions, respectively; and the same is true of their geographical location. For wind power systems, power generation in different geographical locations has different characteristics, and this data difference caused by different geographical locations is also affected by the seasons. In Germany, the Officially spring is during March, April and May, the Summer is taken from June through to August. The autumn is during the months of September, October and November and winter is from December to February. Obviously, geographical and seasonal factors are important for wind power systems. Therefore, among the 96 attributes of this data set, by virtue of the characteristics of the wind power generation system, the entire data set will have different characteristics in different months. Thus, these 96 attributes contain rich feature information, which can better assess the performance of the model. The comparison results of TSFNN and TT2-RVFL, TT2-ELM, OP-ELM and TROP-ELM on 50Hertz, Amprion, TenneT TSO and TransnetBW datasets are shown in Table 10. Overall, results in the Table 10 show that the training error of TSFNN on four datasets is the best, while the test error is the best on the data sets of Hertz50, Amprion and TenneT TSO, and the worse on TransnetBW dataset. For Hertz50, Amprion and TenneT TSO datasets, TSFNN, OPELM and TROP-ELM perform better than TT2-RVFL and TT2-ELM, especially for training error and testing error, TT2-RVFL and TT2-ELM are much worse than the other three algorithms. Among the three methods of TSFNN, OP-ELM and TROP-ELM, TSFNN has better performance. This result shows the excellent generalization ability and feature extraction ability of TSFNN. In the superposition of TSFNN, its members TT2-RVFL and TT2-ELM are also improved, as shown in Remark 4. The comparison results of TSFNN with TT2-RVFL and TT2-ELM prove that this method of composing kernel space with multiple activation functions can enhance the performance of the network. For TransnetBW dataset, the results in Table 10 show that TSFNN, OP-ELM and TROP-ELM perform better on the training set than TT2-RVFL and TT2-ELM.
On the testing set, TT2-RVFL and TT2-ELM perform better than the other three methods. Moreover, the performance of TSFNN, OP-ELM and TROP-ELM on testing set is far from that on training set. And the performance of TT2-RVFL and TT2-ELM is normal, and there is no difference significantly between the performance on the testing set and the training set. This shows that the generalization ability of type-2 fuzzy system is better than the ability of type-1 system, due to the TROP-ELM is a member of TSFNN network, the overall test performance of TSFNN is poor. Because the performance of TSFNN on the training set shows that it has good feature extraction ability on the data set, better results are obtained, and the results of OP-ELM and TROP-ELM are also this reason. At the same time, OP-ELM and TROP-ELM also benefit from the MRSR function. The MRSR function pairs are sorted, and then through the pruning links in OP-ELM and TROP-ELM, better results will be obtained for the data set with more attributes. This sort and pruning result on the training set may not be suitable for test set data, especially when the Wind Power Generation dataset has 96 attributes. This shows that the sorting and pruning of the network has a certain limit, and the effect on some data sets needs to be tested, which may produce large singular values, thus affecting the final results.
In the previous analysis, the stacked tensorbased hybrid single fuzzy neural networks system proposed in this paper has the advantages of concentrating its member networks. At the same time, it inherits the defects of its member network. The performance of TSFNN and other four comparison methods on TransnetBW dataset in Table 10 proves this phenomenon again. At the same time, it also shows that this ability to concentrate the advantages of its member networks also has the s upper limit. When a member network performs poorly, the advantages of other member networks are difficult to fully supplement this defect.
TSFNN can obtain the advantages of these member networks in different problems by combining different member networks, and then these advantages will be concentrated by tensor superposition. Meanwhle, the defects of some member networks are complemented by tensor overlay networks. However, when the performance of some member functions is poor to a certain extent, this complementary mechanism is difficult to completely eliminate the defects of the member function. When it exceeds the "threshold" of this complementary mechanism, the poor performance member network will dominate the performance of the whole network, resulting in poor performance of the whole network.

Conclusions
In this paper, a stacked tensor-based hybrid single fuzzy neural networks (TSFNN) was proposed, which is a neural network combination model. The TT2-RVFL, TT2-ELM and TROP-ELM are used to form the TSFNN network, and TT2-RVFL and TT2-ELM are optimized by using the kernel space method to enhance their performance. TSFNN overlays the hidden layer output network of its member network by tensor. In this process, the tensor-based superposition system inherits the advantages of the type-2 fuzzy sets and the fuzzy reasoning ability of the fuzzy system. Simultaneously, the good performance of the linear system that is generated by MRSR method and pruning method in TROP-ELM is also extracted by tensor structure.
Because tensor results can concentrate the advantages of member networks, TSFNN has good generalization ability and anti-noise ability. TSFNN will also inherit the defects of its member networks. In general, the ability of TSFNN to concentrate the advantages of member networks will make up for the shortcomings of some member networks, but this complementarity is limited, and when the data is too complex, there will still be underfitting, par exemple, TransnetBW dataset in Table 10.
By and large, the proposed TSFNN algorithm has excellent generalization ability, anti-noise ability and feature extraction ability. These capabilities are demonstrated and validated on 10 UCI standard datasets and 4 real world datasets. The TSFNN algorithm supplements the tensor-based model optimization and model combination methods, indicating that the tensor structure superposition neural network is a feasible neural network combination method. For the purpose of obtain a fast model of the data set, unfolding of tensor is used, and then the regression results are obtained by matrix regression. Tensor regression and tensor equation can also be used in this direction, which is a future optimization direction.