Optimized Intelligent Auto-Regressive Neural Network Model (ARNN) for Prediction of Non-Linear Exogenous Signals

The paper presents a prediction of non-linear exogenous signal by optimized intelligent auto-regressive neural network model (ARNN). A signal comprises of two sets of data called deterministic and error. The former type of data represents the degradation index of a signal, while the error is the uncertainties associated with it. To understand and predict signals, an intelligent approach is taken using ARNN model. The deterministic component is predicted by developing a neural network based non-linear autoregressive model and the error component by using a linear stochastic model. The final forecast is formed by combining the results from each of the models and evaluated using the mean square error results. Validation of the prediction is obtained through a comparison of the results with existing models. It shows that ARNN reduced the MSE error 14-36%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}, RMSS error 19-46%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} and NMSE 18-42%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} as compared to existing models. Moreover, ARNN model can be used for low as well as for high volatility data elements. The results show that the proposed model provides improved predictions and minimizes high dependence on design parameters with low computational cost.


Introduction
The field of acoustics studies the propagation and interaction of mechanical waves such as sound waves through different media like air or water. Various methods can be used to model and understand the behavior of those waves including stochastic, deterministic, and statistical methods based on underlying physical mechanics of sound waves. Many prediction models have been presented in literature that are most either model based or data-driven or both. However,model based techniques rely on mathematical models that can realistically predict behaviour based on fundamental physical parameters while also being able to predict the linear component of time series. On the other hand, data-driven techniques are statistically intensive where they rely on artificial intelligence to sift through a huge amount of data and create a model that forecast the desired phenomenon. This allows data-driven techniques to be highly flexible despite the complexity of the system being modeled.Some of these data-driven techniques have been proposed by researchers [1][2][3],. Since these techniques are reliant on historical data to build the predicting model, they require a vast amount of failure data to accurately interpolate and infer a model. Although this might be hard to achieve in some cases, these models are more flexible and can adapt accurately with complex non-linear components of any system. However, as compared to model-based techniques, data-driven techniques are more prone to error as any lapses in coherent information can lead to an inaccurate prediction model. In contrast with the former technique where physics and mathematics dictate the behavior of model reality. Despite being more inaccurate, data-driven techniques are more flexible and can model a complex system with greater ease. This allows them to be more favorable when used with artificial intelligence to extrapolate models using historical data regardless of the complexity of the system.
Predicting time series has become a focal point of research where several unique techniques have been proposed including a method of local modeling by [4]. Similarly, models for time series prediction were proposed by [5] that utilized an ensemble of methods. Building upon this, [6] described a methodology based on Lyapunov exponents to provide multidimensional predictions, however, considering the complexity and dynamic time series, the modeling process has been difficult. To tackle this issue, researchers have turned towards non-linear prediction techniques that have shown to be more accurate regarding prediction accuracy [7,8]. One of the most prominent methodologies is neural networks which has potential to improve the prediction accuracy of non-linear time series. Recurrent neural networks can be used to develop multi-step-ahead prediction due to flexible internal memory structure that accounts the historical data when predicting future values [9][10][11]. In addition, researchers have begun looking at hybrid models that further improve prediction accuracies for investors such as the "Combined autoregressive integrated moving average" model proposed by [3] combined with neural network model. These models were used in conjunction to accurately forecast popular time-series data. Another interesting hybrid methodology incorporated both parametric and nonparametric techniques to predict the exchange market and it was proposed by [12,13]. These methodologies follow a similar procedure in that, they first predict a linear relation using a model-based technique and then they compare the actual values and the predicted results using a data-driven technique that assesses the residuals between the two values. The result of this comparison is compounded within the model and the procedure is repeated thus reducing error between actual and forecast values. However, some hybrid approaches are only suitable for a shortterm future prediction such as [14][15][16]. Some existing approaches are discussed in Table 1.
In this paper, a optimized intelligent prediction model is proposed to predict time series signals. This recursive model involves a neural network based nonlinear autoregressive and an autoregressive moving average. In the proposed approach, deterministic and error components are determined from actual time series data separately. Neural network based nonlinear autoregressive (NN-NARX) is developed to find the dimension of embedded network while autoregressive moving average(ARMX) is simply utilized to calculate the error that can be defined as the error between residual and the actual data. The NN-NARX and ARMX models are deployed in unison to predict the past and the future respectively while the final prediction results are the summative of results attained from a single proposed dynamic model. However, one issue that is often experienced with time series prediction techniques is embedding the dimension of net. This problem mitigate by utilizing either Cao method [22] to estimate the embedding dimension. It separates the deterministic and stochastic signals elements and determine the embedded network dimension and generate the phase space that should be synchronize with hidden units of neural network.

Error Estimation Methods
Error estimation methodology is used to identify the system model and minimize the cost function that calculates the difference between the actual and estimated output.The fundamental model used for error estimation that the system output is measured between two input signal stand the effect of disturbances e t . Mathematically, model is used: where input vector is t andB(q −1 ) is filter kernel, Characteristics of models are depending on the structure of B(q −1 )and adaptive noise e t [6].

Autoregressive with Exogenous Inputs Model(ARX)
ARX model is used to describe the global properties of the system and determine the FIR model coefficients. An Auto-Regressive model with exogenous inputs is specified as [23] where and B(q −1 ) is given as in (3). This implies that the model equals The noise is modelled by a factor of 1/A and multiplied with the dynamics model. It is important to note that ARX does not model noise and dynamics independently [24].

Autoregressive Moving Average with Exogenous Inputs Model (ARMX)
In ARMX the dynamics of the noise that are parameterized are more flexible than the ARX model. The ARMX extends the ARX structure by providing more flexibility for modeling noise using the C parameters (a moving average of white noise). This allows ARMX to be the preferable option when the input is dominated by disturbances known as load disturbances. ARMX model an auto-regressive model with exogenous inputs and Moving Average model for the disturbances is given as [25] and A(q −1 ) is as in (4) and B(q −1 ) is given as in (2).This model output is: ARMX methodology follows the equation-error models. In these models, the observed output is a sum of three regression expressions: previous inputs, outputs and white noise. It is conceivable to infer ARMX methods as describe in Fig. 1, i.e. by considering the parallel connection of a deterministic part determined by the observed input u t and of a stochastic part determined by a remote white process e t . The deterministic part is characterized by the transfer function B(q −1 ) A(q −1 ) and its output y 0 (t) is not accessible. The stochastic part is characterized by the transfer function C(q −1 ) A(q − 1) and its output is a noise e (t) that can signify the effect of white noise on the state of the deterministic part. The observed output is: process noise and measurement noise are assumed to be independent, with zero mean. The white and Gaussian noises are represented as e t ∼ N(0, 2 ), n (d) ∼ N(0, 2 ) . The autocorrelations is used to error estimation, denoted as (n) , can be derived only depend on the lag:

Neural Network-Nonlinear Autoregressive Model(NN-NARX)
One of the more flexible architectures of nonlinear modelling technique is the neural network. This network is built with two or more layers . Generally a network comprises of three layers: input, hidden and an output layer. The number of inputs along with the number of neurons in every hidden layer also can be varied. The neurons are processing units which are cyclically linked .
In proposed approach, the neural network is utilizing time series data to develop a Neural Network-Nonlinear Autoregressive Model. The sequence y t is reflected as a nonlinear function of y (t−1) , ..., y (t−N) . The corresponding equation is shown in (10) where the function g is a nonlinear function, and n d is a noise or error term.
NN model is also suitable for modeling nonlinear systems and time series is the Nonlinear Auto-regressive with Exogenous Input (NARX) neural network. It is considered as a dynamic neural network because it connects several hidden layers of the network to the input layer through recurrent feedback [26]. The NARX model is used to predict the deterministic component as it is the preferable option for nonlinear issues. A single-input Var(y t ) √ Var(y t + n) (10) y t = g(y (t−1) , y (t−2) … y (t−n) ) + e t single-output (SISO) with true output y 0 (t) ∈ r and measurement noise corrupted output y t ∈ r , mathematically is: and true output is y 0 (t) , therefore Eq. (12) can be write: where input represent by (t) and output by y (t) respectively at time t. The state vector is composed of the true system outputs: y t = [y t , … t−n y +1 ] T ; similarly state vector of inputs is: (t) = [ t , … t−n , t+1 ] . g( ⋅ ) is some nonlinear function, and (t) ∈ R is the known system input. The model orders n and n y are order dynamics of input and output, respectively. Therefore, NARX model dynamics model in Eq. (11) as the state equation is: The measurement equation, which describes the mapping from the unobserved states to the observed output signal, is Discrete-time nonlinear mathematical equation of NARX can be represented as where input ( n) and output y(n) of the model at discrete time step n, while d y ≥ 1 and d ≥ 1 and d ≤ d y are the memory orders of input and output respectively. The parameter k is a delay or process dead-time.
NARX architecture with single input and output time series signals and Series-Parallel (SP) network are used in training mode. In this case,actual values of the system's output are used to formed the output's regressors [5].
where estimate values are represented by symbol (̂) . A Levenberg-Marquardt (LM) training algorithm was used to train the NN model in this paper.

Proposed Optimized Intelligent Auto-Regressive Neural Network Model (ARNN)
For the prediction of time series signals different methodologies are utilized, however these techniques are not appropriate for time series signals due to its complexity. Hence, for prediction of time series data, none is suitable model, either ARMX or NARX. By utilizing neural network algorithm for top noise data, challenged to convergence problem if the amount of neurons are less or over fitting thanks to vast quantity of neurons. On the other hand, ARMX model isn't suitable for those data which have nonlinearity and is insufficient the stationary condition. In this paper, an optimized intelligent auto-Regressive neural network model (ARNN) is proposed, in which the data information is partitioned into two components: deterministic and error. The deterministic component is genrated by filtering method and the error (11) y ( t + 1) = g(y t , t ) + e t (12) y t = y 0 (t) + n (d) (13) y (t+1) = Ay t + Bg(y t , t ) + e t (14)  where f � (n + 1) is an estimated embedding dimension of NN-ARMX network is: where i = 1, 2, … , d , j = 1, 2, … , d y and h = 1, 2, … , N h w ih , w jh and w h 0 , are the weights,b h and b 0 are biases and f � 0 (.) and f � h (.) are hidden and output functions. This network is trained by back propagation algorithm and stochastic gradient decent is used to update the weights. This technique reduces the variance in the estimate of the gradient and adaptive learning rate used to minimizes the computation time. Additionally, the number of observations used as the contribution for prediction model,called embedding dimension.Embedding dimension is the dispute often experienced in time series estimating techniques. In proposed approach, embedding dimension is pre-determined by Ca'o method to separate the deterministic and stochastic signals [22].

Performance Evaluation
The ARNN intelligent model is trained by creating a correlation between the predicted value of time series and its residuals, and the original time series. This is done by providing the predicted values and the residuals as the input and the original time series as the output. Then the prediction performance is evaluated with respect to previous algorithms such as the mean squared error (MSE), root mean squared error (RMSE) and normalized mean squared error (NMSE) according to Eqs. (20), (21) and (22)respectively.
where N is the length of data and y i , ŷ i and ȳ i are observation of data,predicted data and the average-observed data respectively.

Result and Analysis
Dynamic nature of time series complex signal was utilized to calculate the proposed optimized ARNN model. Time series filtered data were a set of 901 points as shown in Fig. 2 In general, to diagnose whether a given sequence is normally distributed or not, the Jarque-Bera normality test can be performed.This test analysis determined the kurtosis value of the data,it is 3 or not: In Eq. (23), y is the variable data for which the kurtosis is being computed, and E stands for the expected value. Therefore,if the sequence has k=3 of low-volatility data elements, ARMX model is suitable for prediction or k ≥ 3 for high-volatility data's elements,the predictions are obtained by using an ANN model. However,if the sequence has k = 3 or k ≥ 3 , ARNN proposed approach can be used for prediction. This is done to not only reduce computation time but to also reduce error as the linear component can be modeled by a nonlinear model if the sequence is not separated. Simulation results show that when Jarque -Bera normality test is applied on 900 samples of d-trends time series data (as shown in Fig. 3) then the kurtosis value of k=5.2 is obtained. However, the proposed ARNN optimized technique provided a k=2.5 of the predicted data for the same time series. The NARX estimation model is then built by utilizing data. To generate NARX model, first the embedding dimension is determined by using the Ca'o method. It is used to estimate the embedding dimension of time series by comparing the change in value of distance between two adjacent points from dimension x and dimension x + a . The extricate deterministic signals and stochastic signals are clearly shown in Fig. 4 . This embedded dimension generates the phase space that is synchronized with hidden units of neural network. Therefore, in the ARNN model it utilizes the network's dimension to minimize the error and reduce computation time.
There are two significant qualities that are first and second deterministic components E1(d) and E2(d) should have been determined respectively as shown in Fig. 5. E1(d) is utilized to find the minimum embedding dimension when it extents the saturation point. E2(d) is employed for the difficulty in real computations wherever E1(d) is gradually increasing or has quit ever-changing if embedding dimensions adequately huge.
In this paper were applied 900 sampled of time series sequence,the prediction performance results are shown in Fig. 6. From the results,the proposed model ARNN showed significant improvement in prediction response as compared to the ARMX and NARMX models separately.
The predicted errors are then calculated using an error histogram that represents the distribution of predicted error in every instance as shown in Fig. 7, which represents the error histogram for the proposed scaled Armx,NARX and ARNN models.
The proposed model has a regression value that approaches to 1 as shown in Fig. 8, with a regression value of 1 is considered as a best fit model.
The estimating and prediction methods are used to forecast the future states of data .The prediction results are calculated by the error between forecasted values and actual values in the testing set. Table 2 shows the comparison results of MSE,RMSE and NMSE of three models applied for time series data. In this table, all error parameters of the ARNN model is minimum then other traditional algorithms.

Conclusion
The proposed technique is an amalgamation of an embedding theorem and Neural Network-NARX to predict the deterministic component and an ARMX model to verify the stationary condition along with the error components of a time series. First, it is determined  if the sequence is normally distributed or not by conducting the Jarque-Bera and Kurtosis Hypothesis tests. These tests provide information regarding the normality of the sequence and separate the high and low volatility data elements. Then, embedding Cao's theorem is utilized to extract the embedded dimension and time delay from the original time series. Consequently, the embedded dimension and time delay are developed into phase space and synchronized with the number of neurons of optimized neural network. It overcomes the dependency on design parameters and minimizes the computational cost. Finally, the NN-NARX algorithm is trained using learning rate, gradient descent and momentum to estimate the values of embedded phase space points. This analysis was repeated multiple times to average the result and reduce statistical errors. The Levenberg-Marquardt method is also used to train NN-NARX to accurately predict the original time series and residuals. This procedure allowed the proposed technique to provide more accurate results as compare to other experimental models. The results were validated and the models performance was evaluated by using an error index that compared the predictions with other models in the literature.It was observed that ARNN reduced the MSE error 14-36% , RMSS error 19-46% and NMSE 18-42% as compared to existing models.    [17] 0.267 0.0310 0.392 NARX (Non-Linear Autoregressive Exogenous Neural Network) [18] 0.198 0.207 0.491 ARIMA (Autoregressive Integrative Moving Average) [20] 0.364 0.462 0.398