Echo State Network Optimization: A Systematic Literature Review

In the recent years, numerous studies have demonstrated the importance and efficiency of reservoir computing (RC) approaches. The choice of parameters and architecture in reservoir computing, on the other hand, frequently leads to an optimization task. This paper attempts to present an overview of the related work on echo state network (ESN) and deep echo state network (DeepESN) optimization and to collect research papers through a systematic literature review (SLR). This review covers 129 items published from 2004 to 2022 that are concerned with the issue of our focus. The collected papers are selected, analysed and discussed. The results indicate that there are two techniques of parameters optimization (bio-inspired and non-bio-inspired methods) have been extensively used for various reasons. But Different models employ bio-inspired methods for optimizing in a variety of fields. The potential use of particle swarm optimization (PSO) has also been noted. A significant portion of the research done in this field focuses on the study of reservoirs and how they behave in relation to their unique qualities. In order to test reservoirs with varied parameters, topologies, or training techniques, NARMA, the Mackey glass, and Lorenz time-series prediction dataset are the most commonly employed in the literature. This review debate diverse point of view about ESN's hyper-parameter optimization, metrics, time series benchmarks, real word applications, evaluation measures, and bio-inspired and non-bio-inspired techniques, this paper identifies and explores a number of research gaps.


Introduction
In the last decades the use of Reservoir Computing is increasing [1] especially with the efficiency of the echo state network.The ESN [2] is new variant of Recurrent Neural Networks (RNN).The name of ESN springs from the input history echoes.These echoes represent the reservoir activation states (randomly generated), which include the input history.The ESN is distinguished from the other RNN by its hidden layer that is a large-scale sparsely connected reservoir and its training process that allows training the connection weights based on linear regression method [3].ESN is known by simple structure and high prediction precision.Accordingly, this approach allows training RNN with competitive computational capability [3].This training through ESN needs experience as the randomly created reservoir is relied on the defined global parameters.These parameters have to be defined correctly for gaining high results.To boost ESN performance, researchers have optimized a number of parameters [4] to increase efficiency.
In this paper, we intend to undertake a Systematic Literature Review that includes all of the most often cited and most recent strategies for improving ESN.The motivation of this review is to conduct a coherent study of ESN and DeepESN optimization, to give a clear statement of related works, to interpret, to discuss them and to provide baselines for future research.
The following are the significant distinctions between our SLR and other recent surveys: 1.The first systematic literature review in this topic, as far as we know, 2. In-depth list of covered related works on optimized echo state network, 3. The main scope of the SLR, where the most of the related papers is recently published, 4. A classification, analysis and discussion of the selected papers.
This paper describes first the theoretical background on Reservoir Computing, echo state network and Deep ESN.Then, it presents the followed methodology to select the papers dealing with the main topic of this paper.The outcomes of the data extraction and interpretation are then presented.Following, it provides synthesis discussions.Finally, it concludes and presents research future suggestions.

Echo State Network
ESN is a powerful variant of Recurrent Neural Networks (RNN) [5].Like all RNN, there are three layers of ESN: the input, hidden, and output layer.The key idea of ESN is its very simple way to learn.It is a feed forward network that makes the early layer random and fixed.Training the last layer, which is a simple linear regression, is used for transforming input to predict output.
The rapidity of the ESN is due to the fact that only the connection weights between the hidden and output layers must be trained during the training process.The learning is then very fast.It is important also to set carefully random connection so that ESN do not die or explode.The ESN's structure is given in Fig. 1.The hidden layer in ESN is a big reservoir layer.From the input layer to the reserve pool layer, the connection weights are Win, where W is the connection weights of the reservoir neurons.And from hidden layer to output layer the connection weights is W out (the name for bias is W bias out ).A W back connection from the output nodes to the reservoir (shown dotted in Fig. 1) is optional.ESN can make multi-step predictions when W back is employed.However, sometimes a single-step forecast is sufficient.The time step t is equivalent to u(t), and the number of neurons in the reservoir layer is equivalent to Nr.The Nr neuron's reservoir state is x(t).In the output layer, the output activity of L neurons is y(t).Each time step u is entered, the reservoir states are updated.
As shown in Eqs. ( 1) and ( 2), the internal state of the ESN neurons is updating.The activation function for hidden layers is f.To handle the transmission of the input signal from the input layer to the reserve pool, f is typically a nonlinear function such as tanh or sigmoid.The x(n) is the reservoir's internal updating state, and y(n) is the output of the linear function f out .
The advancement of deep learning has led to the introduction of stacking architecture in reservoir computing.The Deep echo state network, firstly introduced in 2017 by Gallicchio et al. [26].DeepESN is an expansion of ESN that stacks many ESNs with a deep learning framework (as shown in Fig. 2).The former aims for clarity and efficiency, whereas the latter concentrates on the ability to learn complicated aspects for special tasks.
An input layer, a dynamic stacking reservoirs component, and readout layer make up the DeepESN.The reservoirs of DeepESN are set up as a hierarchy of stacked recurrent layers, with the output of each layer serving as the input for the layer above it.

Parameters Optimization
The structure of ESN is shown in Fig. 1 below, with Nu neurons in the input layer, Nr neurons in the reservoir, and Ny neurons in the output unit.W is the state weights matrix (Nr*Nr) of the neuron in the reservoir, and Win is the weight matrix (Nr*Nu).The weight matrix (Nr*Ny) is called W out .
The Spectral Radius (SR) [6], which defines the absolute value of the maximum Eigenvalues of the W matrix of the reservoir, is another discriminant parameter in echo state networks.This parameter gives an indication of an echo state network's short-term memory performance.Also we can note that a large value of spectral radius means slow descending of impulse response.Moreover SR may mirror the number of nonlinear interaction of input units through time so that a big value of spectral radius means extended interactions.Researches have recommended a 0 < SR < 1, that can guarantee that ESN preserve the echo state properties.
Not only SR is an important parameter to optimize but also the scale factor IS which is multiplied by the input signal when it fed into the reserve pool.In general, this IS explaining the amount of the nonlinearity of an ESN.
The proportion of connectivity between neurons is represented by the Sparsity Degree (SD) [7] of the inner weight matrix W, which represents the link into the reservoir layer.If SD = 100%, the ESN becomes a typical RNN.However, to lower the cost of updating the state of reservoir neurons, the ESN must use sparsely connected reservoir neurons (1).The most common choices of sparsity degree [8] used in the literature are between 2 and 10%.
To summarize, the reservoir size, spectral radius, input scaling, and sparsity matrix all have a significant impact on the ESN's performance.Furthermore, studies show that the parameters have no relationship [IE9].Therefore, the ability to effectively and precisely tune the parameters that allow ESN to manner nonlinear systems to achieve optimal performance has become the modern-day research battlefield.

Search Strategy
The search strategy aims at presenting the Systematic Literature review of ESN parameters optimization techniques.Many strategies have been proposed to classify and summarize collected papers from different digital databases.
The methodology followed by this survey is based on the guidelines of [9].It aims at analyzing the methods of optimization of an ESN.Our conducted systematic method is divided into three main phases (as presented in Fig. 3): (1) the aim and research questions, (2) the method of finding and selecting relevant studies, and (3) the knowledge extraction from these publications.
A systematic literature review on how to optimize an echo state network was conducted.The research questions are defined as the initial stage of the selecting procedure.Following are the research questions we have come up with:

Search
The search phase consists on collecting the relevant papers focusing on optimized echo state network and Deep echo state network.We chose a set of proper search strings based on this selection, with which we expected to find the majority of the linked scientific articles.We have conducted a manual search throughout the following five digital databases: IEEEXplore Digital Library, SpringerLink Digital Library, ACM Digital Library, and ScienceDirect.In this survey, we have concentrated on indexed journals and conference proceedings.To retrieve papers from these databases, we have applied the following search command: Before proceeding to the next step, we have eliminating duplicates papers.The output of this step gives rise to 1189 articles that were selected for being related to echo state network or deep echo state network and its optimization.In our case, after the backward snowballing, 1171papers were collected and were included in the result.

Criteria for Inclusion and Exclusion
We specified criteria for inclusion and exclusion in this step, as indicated in Table 1.Based on these criteria our electronic search was refined.First, we used a set of criteria to pick research papers that contained enough data to address our review issue: ESN optimization.With applying I1 and E1 we have remarked that many several research papers on echo state network do not introduce optimization algorithms.In the second step (I2, E2) of selection we focused on paper published between 2004 and 2022 because since 2004, all other nonlinear dynamical modelling methods have been outperformed by ESNs.
In total, 950 research publications were found using the five electronic search engines.After applying inclusion and exclusion criteria, we kept 182 papers (Table 2 provides more information.).
The results of filtering process are presented in Table 3. First, we went over all of the titles and abstracts.Then, using three criteria, we categorized each paper: "potential," "doubt," and "out." There were 182 research articles left for the second round of selection at that date.After reading the introduction and conclusion sections, only 139 papers are chosen.Finally, we read all of the full manuscripts before selecting 129 papers.We used a data extraction form to extract the needed data from the selected papers, which included the data items we thought would be needed to address the review's research questions.
Each item must have at least one term in each group whose aim is to be graded on the list of classified papers.
Consequently, 129 articles have been categorized, as summarized in Table 4.We classified our analysed papers into two groups of key goals: • The first group: deals with bio-inspired optimization methods.
• The second group: focus on non-bio-inspired optimization methods.
The first group of studies comprised those that used biology as a source of inspiration and were known to be effective at solving such difficulties.This collection of papers focuses on aspects of the echo state network where biology is applied.The papers that underlined non-bio-inspired methods fall into the second category.
For each group we have extracted the following data: ( Total: 26 Total: 9 35 Total: 20 Total: 8 28 Total: 13 Total: 3 15 We provide the findings of the data analysis and classification in the next section based on gathered data. By analysing the extracted data, we can answer the previously defined research questions of our study (cf.Section 3.1).The studies we chose have been grouped into two groups (cf.Section 3.3): (1) bio-inspired optimization methods, and (2) on non-bio-inspired optimization methods.Table 4 shows the distribution of papers in each category as well as the number of papers in each category.Bio-inspired approaches have sparked the most interest, as can be seen.

RQ1: Status of the Field
In the previous section, we have extracted data from the 129 selected papers.These papers are published in two different sources, including conferences and journals.Figure 4 presents the papers' types in each category and as well as the papers' number found in each type.The majority of the studies were published in journals, as evidenced by Fig. 2.
In this SLR, papers are graded by the year of apparition between 2004 and 2022.As we can see, echo state networks are a type of recurrent neural network that was introduced around the beginning of the 2000s [11].Since its apparition, the ESN overcomes the limitations during the training of the RNNs while no significant drawbacks are introduced.But the ESN model presents some clear-cut disadvantages when the parameters are not well initialized.Considering that 2007, the posted papers have end up common as compared to preceding years.The years 2011 may be meant as critical milestones because of the growth of posted articles.Additionally, the quantity of works posted in 2018, 2019, 2020, and 2021 and early 2022 suggests the developing hobby of this area until now (cf.Fig. 5).We are aware that ESN optimization algorithms will attract a great interest in the future with greater posted works in specialised meetings and unique troubles of journals.

RQ2: The Most Frequent Methods in Optimizing an echo state network
To answer this research question, we have extracted from the reviewed studies the data (defined in Sect.3.3), about RQ2 for each category (i.e., the standard ESN and the deep ESN) visible in Tables 5 and 6.

Optimization Techniques for the Standard ESN
The echo state network was created to solve the difficulty of Recurrent Neural Networks' gradient-descent training [12].The selection of ESN parameters, on the other hand, frequently requires optimization.Several methods are used to optimize echo state network in order to increase its performance.
In this context, this analysis was done, using extraction techniques.Table 5 was used to visualize the scores of each method of optimization.From Table 5 we can conclude that Metaheuristics (MA) techniques [13] have the potential to be used to improve ESN's parameters [14].Several studies demonstrate that the use of MAs methods (using mathematical features) leads to a higher accuracy and convergence rate than other analogues.Besides, MA's or Evolutionary algorithms are known to have not too much tuning parameters; due to this fact their implementations are quite simple.
By analysing Table 5 and Fig. 6 we can conclude that Particle Swarm Optimization is the most used MA's techniques.Research has shown that this relatively young optimization method has a very short computation time and it is very easy to program all these factors make it powerful.PSO employs a swarm of agents (particles) that moves through the research area looking for the optimal solution [28].Each particle in the research space changes its "flight" based on its own and other particles' previous flight experiences.Each particle leaves a trail: its best answer (personal best), any particle's best value, global best, and gbest.The changing velocity in each iteration affects the new position.[15] Provides an overview of the PSO approaches.
Not only PSO is potentially used but also, other evolutionary computation techniques [16] were found to be competitive with other techniques for optimization ESN's parameters.Genetic algorithms are among these algorithms [17].They are optimization algorithms based on genetic and natural evolution techniques: crosses, mutations, selection, etc. Genetic algorithms already have a relatively old history since the early work of J.H. Holland in 1992 [17].A genetic algorithm searches for the optimum(s) of a function defined on a data space.
To make use of it, we need the following five things: (1) The coding principle for the population element.(2) A method for populating the starting population.This process must be able to generate a population of non-homogeneous individuals that can be used as a foundation for future generations.(3) A function that has to be improved.This yields a  4) Operators that allow for population diversification across generations and space exploration.The crossbreed operator recomposes the genes of existing individuals in the population, while the mutation operator ensures state space exploration and (5) Parameters for dimensioning: population size, total number of generations or criterion stop, application probabilities of crossover and mutation operators.
Evolution Strategy, Differential Evolution, backtracking Search, and other bio-inspiredtechniques are used also to optimize parameters of the ESN.
Another intriguing result is that some studies have shown that intrinsic plasticity can be used to enhance synaptic connections.The most prevalent brain plasticity processes are synaptic plasticity [18] and intrinsic plasticity [19], which characterizes a neuron's ability to change the type of response it gives to the same input.The intrinsic properties of the neuron or the properties of synapses can be affected by changes.In the first case, it is intrinsic plasticity and in the second case, it is synaptic plasticity.Neuronal plasticity constitutes the unitary mechanism of memorization and learning.
In the other side of techniques (non-bio-inspired techniques), we can note that the Bayesian optimization (BO) have the potential to be used.BO [20] is a one-sided surrogate-based optimization technique that uses a gradient-free global optimization technique to optimize arbitrary functions [21].When gradients are unreachable, it was introduced to reduce loss functions of an artificial neuron network whose (hyper) parameters are difficult to modify, or when objective function is a non-convex one (many local optima).Additionally, Table 5 indicates remarkable presence of gradient descent algorithms [22] with its two variants (gradient descent and stochastic gradient descent).
The analysis and comparison of each approach were compiled in Table 6 based on the studied papers.Additionally, to help researchers choose an optimization method, this SLR tackle the technique model used in a particular situation on popular benchmarks.7 revealed that PSO had better forecast accuracy.Again, we note the use of GA as a bio-inspired method.Bayesian optimization, State transition algorithm and Broyden-Fletcher-Goldfarb-Shanno quasi-Newton algorithm were used also to further lower the cost of optimizing the DeepESN hyper-parameters.

RQ3: The Commonalities Between the Different Proposed Techniques of Optimization
To answer this research question, we have extracted data (3) and ( 4) from the reviewed studies (defined in Sect.3.3).Table 8 presents the captured data about RQ3 for each category (i.e., the standard ESN and the deepESN).

The Standard ESN
See Table 8.The second group of studies focuses on improving the network's topology, such as weight connections and reservoir connectivity.
In this scenario, the parameter representation can be a 1-dimensional vector of sufficient length to fit the hyper parameters to be optimized.[P1, P2, ..., Pn], for example, where P is the parameter to be optimized and denotes the number of parameters to be optimized.
In this Case, weights can be a two-dimentional binary matrix, where lines and colums of matrix represent neurons: for example in a matrix C, C ij denotes connection between neuron i and neuron j , if the value of C ij is 1, this indicates the presence of a link between neuron I and neuron j, whereas (Cij = 0) indicates the absence of such a connection.

The Deep ESN
In practice deep ESN is a neural network with multiple untrained reservoir layers stacked one on top of the other, it is firstly introduced a few years ago.An area of research that is rapidly expanding is the study of hierarchically structured deep neural network architectures.One of the most important factors is the significant amount of computational resources that deep neural network training normally requires, hyper parameters tuning and even more important factor in this context.
(1) Structure designs: The application of the deep ESN may lead in some studied papers to apply deformations based on the fundamental DeepESN structure.On the connections between stacked reservoirs, several previous investigations concentrated on changing the way stacking reservoirs layers in the deep network.Similar to the wide structure of several reservoirs, Wide DeepESN may be arranged by numerous modules of stacked reservoirs.(2) Network analysis: The behaviour of dynamical systems, like reservoir layers, can range from stable to chaotic.The following defines a system as chaotic if random minor state perturbations at a given moment continue to impact the system's state for a very long time rather than disappearing, and stable if they don't [24].The edge of stability refers to the area in the parameter space where a split between stability and chaos occurs (or, the edge of chaos).Systems that are in this phase transition have good behaviour.In order to improve the richness of deep ESN, in some works, the averaged entropy of the state of hidden layers and proximity to edge of stability are measured using different supervised and unsupervised techniques.Information storage capability, or short-term memory, is a crucial characteristic of systems that operate on time-series data.A measurement of it known as Memory Capacity (MC) [24] and it consists of evaluating the network's capacity to recall increasingly delayed input samples.Commonalities in deep ESN are presented by Table 9 and Fig. 6.

Numerous works, investigate different deep ESN topologies. Another way of structuring input
As shown in Fig. 7, the number of works that take into account designing network architectures is great.These papers include the initialization of hyper-parameters and network connectivity.Moreover, the investigation of The impact of network topology and fundamental hyper-parameters on the tasks' outcomes is a basic job in this type of research.

RQ4: The Most Frequent Datasets Used in the Proposed Algorithms
ESNs were demonstrated to have a number of benefits for both synthetic and real-world tasks when it comes to the experimental analysis in applications.

The Most Benchmark Datasets Used in the Proposed Algorithms
In this section, we focused on the most used benchmark datasets in the studied papers.
(1) NARMA and ( 2) Mackey Glass: the two most frequent datasets in our studied papers are NARMA and Mackey glass which are two real data benchmarking sequences.They are characterized by their nonlinear autoregressive moving average [23].One of the most often used called NARMA which is characterized by its unpredictable sequence complexity and its high stages of misunderstanding and non-correlation of inputs.Its dynamic expression is given by the Eq. ( 3): The output and input of the system at time t are denoted by y(t) and x(t), respectively.Initial values are given to the constant parameters ci.The complexity of NARMA is determined by parameter k.In most cases, k is set to 10 or 30, which are the two most popular values in the literature.
(3) The Lorenz time-series prediction: a dataset which is used as a benchmark for ESN [8].Equation ( 4 The Lorenz system is a nonlinear equation system because of the terms xy and xz.This system cannot be integrated in the general case.The determination of this system must be done using approximation methods.When the real parameters σ, r and b take the following values:a = 10, b = 28 and c = 8/3, with the initial conditions x (0) = y (0) = z (0) = 0.01, the system (2.9) is chaotic.Robot localization in critical environments Distributed, embeddable and federated learning Driving-Style Personalization Based on Driver Stress We can note also that many others benchmarks are used in this field.Some of them are private data such NASA's turbofan engine.As well as others that are publicly accessible, such as Iris.

The Most Real-World Applications Used in the Proposed Algorithms
In this section, we check to see if the models under investigation have been validated using real data or tested in the context of useful applications.The study of real data is the fundamental technique for evaluating the proposed approach.Table 10 and Fig. 8 summarize the obtained results.
Recent studies have shown that the ESN techniques work well for a range of real-world challenges.
(1) Human Activity Recognition: In smart homes, activity support and care are essential components of activity recognition.In order to successfully learn input-output temporal connections from a potentially enormous amount of noisy and imprecise heterogeneous streams of sensed data, neural network models for sequential domains processing become strong candidates for applications in human activity detection challenges.Consequently many studies use echo state networks to describe the stream that is acquired from a Wireless Sensor Network (WSN).
(2) Clinical applications: Future research and clinical applications using ESNs are looking promising.In fact, RC has been successfully applied for clinical purposes include medical disease detection [IE40], prediction of blood glucose concentration for type 1 diabetes [SD31], analysing mental disorders [SD32], and many more medical procedures.
(3) Distributed, embeddable and federated learning: Federated learning is a decentralized learning technique that can work with training datasets that are dispersed across several machines [25][26][27].The implementation of distributed learning via wireless networks has recently been the subject of several previous studies.The predictive accuracy of the aggregated model is not guaranteed by common federated learning techniques for recurrent neural networks (RNNs).Consequently, many works demonstrate how an effective form of federation can be achieved by using echo state networks (ESNs), which are cutting-edge RNN models that are highly effective for processing time-series data.ESNs produce models that are mathematically equivalent to the corresponding centralized model.
(4) Distributed Intelligence Applications: Systems of creatures collaborating to reason, plan, solve issues, think abstractly, understand concepts and language, and learn are referred to as distributed intelligence.Here, we define an entity as any kind of intelligent process or system, including agents, people, robots, smart sensors, and so on.In these systems, many entities frequently specialize in specific facets of the current task.As humans, we are all accustomed to dispersed intelligence in groups of other individuals.A particular focus has been placed on Recurrent Neural RNN models because it appears that they are a promising strategy for use in time series modelling.Within these algorithms and models, the echo state network models have been chosen among the others.Any type of event may be predicted and modelled using the ESN model (5) Autonomous Vehicles: In recent years, electric cars have garnered the most attention as the most significant new energy resource vehicles.Electric car research has been conducted by several firms and academics with positive outcomes.Due to its lower node count and simpler computing needs, ESN has been applied to several nonlinear systems in this field.Teng et al. [IE42], proposed an adaptive ESN control for Multistep ahead prediction of vehicle lateral dynamics.It is proved also that these ESNs [SD34] [SP26] have achieved successful acquisition in this field.(6) Robot localization in critical environments: Robotics has developed specialized methods that may be used to locate mobile robots, such as those that rely on cameras or laser range finders.Many of these methods still have drawbacks, though, such the need to locate the robot at each start up and dependability that falls short of the minimum standard.In this sector, ESN has been used with a number of nonlinear systems.
Dragone et al. [SP18] developed an RSS-based Robot Localization in Critical Environments using Reservoir Computing.
(7) Human Activity Monitoring: Systems for monitoring human activity are created as part of a framework that makes it possible to continuously monitor human behaviour in areas including ambient supported living, sports injury detection, aged care, rehabilitation, entertainment, and security in smart home environments.Other facets of the recurrent neural network were revealed in later investigations.For instance Bacciu et al. [SP26] Forecasted human indoor mobility.
(8) Driving-Style Personalization Based on Driver Stress: The cognitive state of a human during a driving experience is influenced by how the vehicle and its surroundings are seen, as well as by the individual's own beliefs, preferences, and expectations for an autonomous driving situation.Several studies on perception and the resulting mental state [IE46], [IE47], [SD36] combine data from a wide range of sensors attached to the vehicle (e.g., accelerometers, gyroscopes, etc.) and the driver in an effort to identify the driving style and relate it to the comfort or stress level of the driver/passengers.
(9) Brudaga Syndrome: In current study [IE41], Brugada Syndrome was predicted from electrocardiogram (ECG) data using ESNs.The project Brugada syndrome and Artificial Intelligence Applications to Diagnosis, which aims to create a cutting-edge system for the early diagnosis and classification of Brugada Type 1, serves as the basis for the study (Fig. 9).

Discussion
This paper presents a survey of recent researches that focuses on techniques of optimization of echo state network's parameters.

Findings
The most findings from this study are the following.In fact, ESN optimization algorithms were attracting greater interest in the course of the final years with huge number of posted items.Possibly, because ESN still a young field of research, moreover reservoir computing proves its efficiency in time series tasks.
Furthermore, bio-inspired approaches have been employed extensively to tune ESN parameters.The optimization parameters that have a considerable impact on the ESN's performance are size of the reservoir, connectivity rate, and spectral radius.The size of the reservoir is important because it determines the capacities of the reservoir whether it is the computational or memory capacity.As with neural networks, too many units can lead to over-learning and waste of computing time.As a result, we can see that a reservoir's memory capacity cannot exceed its number of units; this must be considered in order to integrate enough units to match the task's requirements.
The connectivity rate is also an important parameter, which specifies the number of connections between the units of the reservoir.Strong connectivity indicates that the units are all connected to each other, while weak connectivity [12] indicates that each unit is only connected to a small number of other neurons.A low connectivity reservoir shows slightly better performance but does not affect much the capacities of a reservoir.However, a weakly connected reservoir has a matrix W made up of large part of zeros.
Thus, with a representation of the sparse matrices, a weakly connected reservoir allows to calculate more quickly the updates of its states.It is therefore advisable for this parameter to use a weakly connected reservoir (10%), regardless of its size, in order to save calculation time.the spectral radius is another decisive parameter which indicates the greatest eigenvalue in value absolute.A spectral radius less than unity (spectral radius < 1) is regarded a satisfying the following condition for the attribute of reasoning in a vast number of publications.Based on the studies reviewed, we may deduce that spectral radius optimization is computationally expensive, and that researchers have just recently discovered a way to optimize the reservoir using metaheuristics.
The PSO is the mostly used technique to optimize the ESN and also DeepESN.This may be due to its little number of parameters in addition to the simplicity in implementation.The PSO is indeed multifaceted sufficient as the choice optimization algorithm for reservoir computing methods.
The ability to create increasingly more abstract representations of the driving input is a first intrinsic benefit of depth in RCs.This implies that in the temporal domain, individual layers can concentrate on various time scales, and the networks as a whole are able to represent temporal information at various time scales.
Deep ESN Researching the structure design of the deepESN is a valuable area of interest in RC.The goal is to create "better" reservoirs architecture.Recent research on deep RC has emphasized even more how certain network architectural features contribute to the development of developed dynamics.Results in [24] highlighted the importance of an appropriate scaling of inter-layer connections, or of the weights in matrices, in this regard.The quality of dynamics in higher layers of the network was shown to be significantly impacted by this scaling.
Studies have demonstrated that the stability of dynamical systems and local Lyapunov exponents were taken into consideration while examining the richness of deep reservoir dynamics.The key finding in this respect is given in [24], where it was demonstrated analytically and empirically that layering the same number of recurrent units naturally causes the resulting system dynamics to approach the critical point.From a related vantage point, it was discovered that deep RC settings increased short-term memory capacity in contrast to corresponding shallow designs [24].
ESNs have been widely used to address real-world issues in clinical applications, teaching, Human Activity Recognition, autonomous vehicles and a variety of other areas of life.Additionally, ESNs are well-suited for forecasting and prediction purposes.
Another remarkable discovery is the fact that vast majority of reviewed works uses at least one of four benchmark problems which are NARMA, Mackey Glass, Lorenz timeseries prediction and short-term wind speed.In reality, these benchmark datasets have been routinely utilized to test time series prediction difficulties.The chaotic properties of their system, as well as its extended memory, make forecasting challenging [23].Furthermore, these datasets are applied to quantify the ESN's learning abilities.

Research Gaps and Future Works
Observing the different performance of the optimization algorithm, the fundamental disadvantage of these reservoir optimization approaches, that we may conclude, is that they either use too many factors, such as optimizing the reservoir weight, or they only improve a few effective characteristics.In both circumstances, there are issues.When there are too many optimization parameters in the first scenario, the optimization process is costly to run, which is impractical in some cases.The performance of the reserve pool cannot be considerably enhanced in the second instance, where only a few effective parameters are modified.
Particular research in neuroscience and in Reservoir Computing focuses on intrinsic plasticity which represents the capacity of a network to adapt its units and parameters.Several types of plasticity exist; such as intrinsic plasticity which regulates the excitability of neurons and induces stable homeostatic effects in the dynamics of an ESN.Another type of plasticity, synaptic plasticity, modifies synaptic weights to regulate impulses received by a neuron.However, its application has remained largely unsuccessful.Finally, synaptic standardization appropriately modifies the levels of a neuron's incoming connections so that the aggregate of all neurons' activations remains consistent.
Several strategies are utilized to reduce unnecessary reservoir connection weights during training to improve ESN performance.The echo state property cannot be preserved by cutting operations on the reserve pool.Meanwhile, many academics are focusing on cutting redundant connections from the reservoir to the output layer to optimize the ESN structure.The most often used strategies for optimizing network structure are regularization methods.These techniques can improve the echo state network's generalization ability while simultaneously addressing the issue of overlearning.Nonetheless, based on the dynamical organization of reservoirs and the number of training patterns, determining the appropriate value for the regularization parameter is difficult.
Additionally, the evolutionary computation algorithms were employed to optimize the ESN structure.Particle swarm optimization and genetic algorithms, as described in Sect.3.2, are utilized to optimize the ESN's weight connection topologies.The evolutionary computation, on the other hand, techniques are heuristics algorithms that prove a high computational complexity.As a result of this strategy, another question arises: what is the most efficient method for cutting duplicate connection weights and improving ESN performance?
In future research, we may look into more appropriate intelligent algorithms for optimizing the W out of echo state network, because training the output weights of ESN using the traditional linear regression method causes the trained network to over fitting.The use MA's to optimize ESN architectures may solve the problem.But we remember that these techniques are heuristics whose objective is not the same than that of the exact methods.These methods, in particular, do not ensure the discovery of an optimal solution or the proof of the optimality of the solution identified-unless a lower bound of the cost function is known and the solution obtained meets this criterion.
Moreover, next generation of RC may focus on measuring the activation entropy of recurrent units and the Edge of stability using different techniques.RC systems close to instability exhibit best performances if the job at hand needs extended short-term memory.
The fields of RC can be extended in the future to deeply study deep reservoir computing, embedded software, stable ESN architectures, quantum artificial intelligence, and graph neural networks.
ESNs were applied in various real word problems, May RC also cover the biggest problem in the word which is climate change.

Conclusion
Our SLR concerns ESN and DeepESN which are considered as a Recurrent Neural Network variant with a faster training speed because only the W out is trained.To improve the ESN's performance with its two variants shallow and deep, many research works proposed optimizing its parameters.Several algorithms have been introduced, but the best are the bio-inspired ones, considered the first in reservoir computing.Furthermore, we have highlighted the potential use of PSO which is particularly effective for nonlinear, continuous variable, integer or mixed optimization problems.The study of reservoirs and their performance according to their specific characteristics constitutes a broad part of the work done in this area.For this, test benches are commonly used in the literature in order to test reservoirs with different parameters, different topologies or which are trained with different methods, in order to estimate the impact.Three of the most frequently used tests in the literature are NARMA, Mackey glass and Lorenz time-series prediction.Moreover, recent studies have shown that the ESN techniques work well for a range of real-world challenges.The most cornered research gaps are the impact of number of optimized parameters, whether plasticity of ESN effect its performance, the optimization of the structure of ESN, as well as the readout weights.

Fig. 2
Fig. 2 Deep echo state network structure

•
RQ1: How far has this field of study progressed?• RQ2: What are the most frequent methods in optimizing an echo state network and Top-performing technique on benchmark datasets?• RQ3: what are the commonalities between the different proposed techniques of optimization?• RQ4: What are the most frequent datasets used in the proposed algorithms?

Fig. 3
Fig. 3 Article distribution by digital databases

Fig. 6
Fig. 6 Distribution of optimization methods by studied papers

Fig. 8
Fig. 8 Distribution of well-known benchmark datasets by studied papers

Fig. 9
Fig. 9 Distribution according to real word applications

Table 1
Inclusion and exclusion criteria

Table 2
Selection of papers

Table 4
Distribution of selected papers by category

Table 5
Distribution of selected papers by used techniques

Table 6
Top-performing architecture comparison on benchmark dataset

Table 7
Distribution of selected papers by used techniques

2 Optimization Techniques for the deepESN
[IE48]ecurrent model research is still at an early stage.In this SLR, we have suggested a review of the optimization methods in deep recurrent architectures.Consequently all actual studies still focus on establishing essential preconditions for the echo State Property (ESP) of deepESNs and extending and generalizing existing findings in standard RC literature to the situation of multilayer RC networks.In this context, the basic methods for finding the best design for the network are practically the same as the standard ESN, for instance, authors in[IE48][IE52][SD38] evaluated the PSO method.Table

Table 9
[SP29]N may be created to accept various data components.A bidirectional reservoir was built by Ibrahim et al.[SP29]whose final state additionally includes the input's historical dependencies.

Table 10
Distribution of real-word problems by studied papers