A Survey and Analysis of Extreme Machine Learning Models and Its Techniques

Machine learning applications employ FFNN (Feed Forward Neural Network) in their discipline enormously. But, it has been observed that the FFNN requisite speed is not up the mark. The fundamental causes of this problem are: 1) for training neural networks, slow gradient descent methods are broadly used and 2) for such methods, there is a need for iteratively tuning hidden layer parameters including biases and weights. To resolve these problems, a new emanant machine learning algorithm, which is a substitution of the feedforward neural network, entitled as Extreme Learning Machine (ELM) introduced in this paper. ELM also come up with a general learning scheme for the immense diversity of different networks (SLFNs and multilayer networks). According to ELM originators, the learning capacity of networks trained using backpropagation is a thousand times slower than the networks trained using ELM, along with this, ELM models exhibit good generalization performance. ELM is more efficient in contradiction of Least Square Support Vector Machine (LS-SVM), Support Vector Machine (SVM), and rest of the precocious approaches. ELM ’s eccentric outline has three main targets: 1) high learning accuracy 2) less human intervention 3) fast learning speed. ELM consider as a greater capacity to achieve global optimum. The distribution of application of ELM incorporates: feature learning, clustering, regression, compression, and classification. With this paper, our goal is to familiarize various ELM variants, their applications, ELM strengths, ELM researches and comparison with other learning algorithms, and many more concepts related to ELM. extreme learning machine, artificial neural networks, support vector machine, least square-support vector machine, single-layer feed forward networks.


Introduction
A kinds of artificial neural network, acronym as ANN, is a feed-forward neural network (FFNN), in which no loop exist between the connections of units. This type of network is essentially adapted for supervised learning and it is a biologically inspired classification algorithm [1] (as shown in Fig 1). Primarily, FFNN is implemented by notorious backpropagation (BP) algorithm (1986, Rumelhart, Hinton & Williams), which has been suffered from two massive problems: reluctant convergence and local minimum problem. Thenceforth, numerous approaches are developed to train FFNN in efficient and optimal manner, it comprise second order optimization method (1994,2010), subset selection method (1991,2005) and global optimization method (1995,1993). Prior approaches outperform BP (nevertheless assured to achieve global optimum) in terms of training speed and generalisation performance [2] (as shown in Fig 2). . According to ELM originators, the learning capacity of networks trained using backpropagation is a thousand times slower than the networks trained using ELM, along with this, ELM models exhibit good generalization performance. ELM consider as a greater capability to achieve global optimum. The fastest speed of ELM is due to the elimination of the need for iterative tuning of hidden node parameters and ELM also satisfy Bartlett's theory (as shown in Fig 3). In ELM, 'Extreme' indicate movement in the direction of brain like learning from typical learning techniques [4]. Initially, ELM employed for SLFNs (Single Layer Feed-Forward Networks). There are two fundamental problems rectify by Extreme Learning Machine (ELM): 1) Different types of networks (SLFNs and multilayer networks) necessitate different types of learning algorithms 2) Requirement of tuned extensive amount of hidden neurons together with biological neurons (although whose modelling is unrevealed) in learning [5].
Here, different type of SLFNs (Single Layer Feed-Forward networks) includes RBF networks, polynomial networks, sigmoid networks, complex (domain) networks, wavelet networks, Fourier series etc. There are basically three types of approaches in training of feedforward networks: a) Gradient Descent approach (used by Backpropagation (BP) for multilayer feed-forward networks) b) Standard optimization approach (employed by Support Vector Machine (SVM)) and c) Least Square based approach (used by ELM for generalized SLFNs). The difference in the optimization strategy of implementation in conventional learning methods and ELM as shown in Fig 4. ELM emphasize on mainly three components: real time learning, High accuracy and least user intervention (as shown in Fig 5).  , Random projection associated with feature space methods. Prior to ELM theory, these neural networks had some issues, such as in full random nodes, separational capacity is not substantiated, optimization constraints are not adjusted, and the dimensionality of hidden maps is smaller than the amount of training data, universal approximation capabilities not verified and proving.. As in the farmer case, feature space methods have a few flaws, such as a hazy relationship with neural networks and the possibility that universal approximation capability has not been validated. ELM bridged the gap between these neural networks and feature space methods and also filled gap between these methods and biological learning [5] (as shown in Fig 6 and Fig 7). The most effective ELM application areas are regression, classification, clustering, and feature learning. The additional characteristics exhibit by ELM are: It extends to almost every nonlinear piecewise continuous function, expands to kernels and to a higher degree of mapping of hidden instances, proves the universal approximation and separation capability of generalised SLFNs, employs generalisation theory for learning optimization, and demonstrates the linear independence of random hidden neurons., It also demonstrates that autonomous parameters of hidden node data can be used as a bridge between ridge rectification, system stability, theoretically based neural network generation, maximum margin and additionally network optimization constraints included in the ELM framework.  Figure 7 illustrates ELM characteristics on the basis of generalization SLFNs and multi hidden layers of networks, structure risk minimization and ridge regression theories, homogeneous learning algorithms and kernels/random hidden nodes/neurons inherited from ancestors. The hidden neurons in ELM can be allocated randomly and cannot be upgraded or they can be inherited without being modified by their ancestors. Linear models require learning in just one step, in ELM you can learn the output of the hidden nodes in one step. ELM theories show that these hidden nodes rely on training data and that the weight value of the hidden layer should not be adjusted. The ELM original design has three objectives in terms of learning effectiveness: high accuracy, minimal interaction between people and fast learning speed. When ELM trains on SLFNs, there are two major stages: randomised feature mapping and linear parameter solving. On the basis of different parameters, Fig 8 demonstrates ELM characteristics through which understanding of ELM concept is much better. The rest of the paper is organised as follows:: Section 2 discusses the "Extreme Learning Machine", which comprises how it works, its essence, its learning model, its kernel and nonkernel learning, its pertinent features and its learning principles which describe the basic learning principles on which ELM works. Section 3 tells about the "ELM theories", which consist of theories satisfied by ELM. Section 4 comprises "application areas of ELM", which describe the application areas on which ELM is applicable. Section 5 represents "ELM approaches and paper contemplate", which tells about the proactive approaches of ELM and survey of some journal papers. Section 6 consist of "comparison between biological learning, conventional learning and ELM", which describe the differences among biological learning, conventional learning and ELM, and distinction between deep learning and ELM. Section 7 comprises "comparative analysis of data and suggested models" which describes the data set analysis used by various algorithms and their pictorial representation and also some suggested models which may solve some current problems. Section 8 comprises "conclusion", which finally concludes the whole survey paper.

Extreme Learning Machine
Huang G.B. proposed a new approach to enhancing SLFN generalisation performance, called the Extreme Learning Machine, while avoiding the time-consuming iterative tuning process (ELM) [7]. The classical ELM procedure is as follows: First, it determines the total number of neurons in the network., then it sets the hidden layer's biases and the weights between the input layer and the hidden layer at random; and secondly, It computes the output matrix of the hidden layer and finally, unlike the conventional network learning algorithm, Using the least square method, the Moore-Penrose pseudoinverse computes the weight between the hidden and output layers (as shown in Fig 9). Step-by-step procedure in classical ELM One of the reason behind for adapting ELM is: its compact network structure (as shown in Fig 10) and easy process of network parameter computation and fast learning speed. During training of SLFNs by ELM, it consist of mainly two phases: a) randomized feature mapping, here randomly initialization of hidden layer is done so that by using some nonlinear feature mapping functions, input space is converted into ELM feature space (as shown in Fig 11) solving of linear parameters. Nonlinear feature mapping, which is not limited to any of the nonlinear piecewise continuous functions (given in Fig 12), are being applied in ELM. The ELM main idea is to allocate hidden layer weights and biases to random, followed by calculating weight with a least square solution defined by target outputs and the hidden layer.
Mathematically, we can represent ELM model as: Where, N = set of unique sample (Xi, ti) …. hidden layer nodes (Ň) are corresponds with the different training samples (N) i.e. Ň = N, here H imply a squared matrix calculation of output weights i.e. β can be effectively carried out by inverting H and SLFN's estimated these training samples (N) with zero error i.e. [9]. In most cases, the number of hidden nodes is significantly lower than the number of diverse training samples. i.e. Ň << N, so H will be a non-square matrix and there may not exist bi, wi, βi, where i=1,……,Ň such that Hβ = T. In this situation, we might have to identify particular set of ŵ, β, b̂, (where, i=1,….,Ň) in such a manner [10]-Various approaches are available to evaluate Moore-Penrose widespread inverse matrix including iterative approaches, orthogonal projection technique, orthogonalization strategy and SVD (singular value decomposition), etc. The random generation of input weights and hidden biases is a distinguishing feature of ELM instead of tuning these network parameters to turn the non-linear system into a linear system, Hβ = T [6].
In these two cases, a regularisation term is added to β in order to make the result stable and improve the network's generalisation capacity: a) In this situation β can be described as follows, when the amount of training data goes beyond the number of hidden neurons: b) In this situation, β could be affirmed as follows when the number of samples for training is below the number of hidden nodes:


Hidden layer neurons tuning is not essential.  "Randomness" is one of the ELM implementation, but not every method.  Some of the typical approaches employed "semi-randomness".  Universal approximation conditions met by hidden layer mapping (h(x)).  Minimize:
 Both ridge regression theory and network generalization theory satisfied by the ELM.  It act as a link between SVM, neural networks, Fourier series, random projection, linear systems, matrix theories etc.

Kernel and non-kernel learning
ELM is justifiable for both non-kernel and kernel learning.

Pertinent features of ELM
 The machine typically has one layer of hidden nodes. The weights between the input and hidden nodes are never changed.  In a single step, weight between hidden nodes and the output neurons can be deduced.  As a consequence, there is no parameter interdependence (weights & biases).  Hidden nodes are created at random and are not iteratively tuned.  Need to be learn-free parameter between hidden & output layer.  Efficient and reach global optimum.  Even randomly produced hidden nodes, this maintains the universal approximation capacity of SLFNs.  Generalization performance better.  ELM's generalisation ability outperforms SVM and its variants such as LS-SVM.  Least human intervention  Fast training speed and learning accuracy.  RKS (Random Kitchen Sinks) -A subset of ELM that necessitates the construction of the hidden layer on a Fourier basis.  ELM trains SLFN in 2 stages:  Feature mapping is random  Solution provided by linear parameters  Moore-Penrose generalised inverse of H yields the optimal solution.  ELM employ random feature mapping whereas SVM use kernel function for feature mapping and Deep NN use RBM/ auto encoder/ auto decoder for feature mapping.  Crux of ELM  Generalization performance (min. training error & smallest norm of wt.)  Universal approximation capability  Learning without the hidden nodes iteratively tuned  Uniform theory of learning  Learning principle of ELM [52]-I) Huang (2014) demonstrate that hidden neurons of SLFNs with almost any nonlinear piecewise continuous activation function or their linear combinations can be generated at random using any continuous sampling distribution probability, and such hidden neurons can be …………….. (9) ……………… (10) ……… (11) independent of training samples as well as their learning environment. Learning stability is also taken into account by the ELM learning network and generalization performance which have been omitted by most conventional learning algorithms when they were first time proposed. II) (Huang, 2014) The output standards of generalised SLFNs must be smaller with certain optimization restraints in order to preserve system stability and generalisation performance. III) (Huang, 2014) As per perspective of optimisation, SLFN output nodes should be free of biases (or set bias zero). This principle is distinctive from the NN community's broad consensus of that hidden nodes of output require bias (Schmidt et al., 1992;White, 1989White, , 2006. Biases at the output nodes lead to inadequate (suboptimal) solutions, from the perspective of optimization.  ELM has also been used in microarray data classification, where it outperformed SVM.  Input data is non-linearly embedded in a higher dimensional space.  Adapted without an increase in training time to multiclass classification and multioutput regression.  Data separation improves as the dimensionality of the data increases.

ELM Theories
This section provides a brief description of ELM's theory of interpolation, universal approximation and generalisation bound.

Interpolation theory
The following theorem explains how ELM's learning capabilities can be explained from an interpolating standpoint: Theorem 1 (Huang, Zhu, et al., 2006). Given any small positive value ϵ > 0, any activation function which is infinitely differentiable in any interval, and N arbitrary distinct samples (xi, ti) ∈ R d × R m , there exists L < N such that for any {ai, bi} L i=1 randomly generated from any interval of R d × R, according to any continuous probability distribution, with probability one, ∥Hβ − T∥ < ϵ. Furthermore, if L = N, then with probability one, ∥Hβ − T∥ = 0.
Description: Above assertion shows that an ELM network exists that provides reasonably tiny training errors in the sense of a squared error, and this applies to each particular training set, which means that the total number of hidden neurons does not surpass the total number of training sets. Simply, it convey the message that if there are both equal numbers of separate training samples and hidden neurons, the training error will decrease to zero with probability one. So, from an interpolation theory point of view, ELM fits perfectly to each sample, with sufficient quantities of hidden neurons.

Universal approximation capability
In SLFNs, during training phase, hidden neuron parameters must be adjusted. It is often said that the activation function of hidden neurons is continuous and differentiable. However, without being trained, the hidden layer parameters are produced at random using ELM, regardless of the universal learner. Theorem 2 (Huang & Chen, 2007, 2008; Huang, Chen, et al., 2006). Given any nonconstant piecewise continuous function G : According to the above theorem, there is no requirement for a continuous and differentiable activation function in the case of ELM; instead, a threshold and certain additional activation functions can be used and these other functionalities, together with the most frequently utilised activation function, meet requirements for the universal approximation of ELM.
Theorem 3 . Given any feature mapping The theorem demonstrates that, for classification tasks, if the number of hidden layer nodes is large enough, ELM approximates any complex decision limit.

Generalization bound
Different techniques used to learn general machine ability are Bayesian framework, statistical learning theory (SLT) and cross validation. VC (Vapnik-Chervonenkis) dimensional theory in the SLT was one of the most frequently used generalisation bond analytical frameworks. Vladimir Vapnik and Alexey Chervonenkis originated this VC dimension. The algorithm is described as the measure of an algorithm's capacity for statistical classification and gives the cardinality of the greatest number of points.
The test error expectation can be specified as-

Where, Prediction function
Its parameters

Loss function
Cumulative distribution of probabilities creating training and testing samples.
In practice, actual risk ( ) is difficult to compute. So, we compute empirical risk as- The following inequalities holds with probability (1-ɳ) given as-

Where,
Learning machine's VC dimension ……………(12) The inequality mentioned above shows "VC bound" or an upper bound. SRM i.e. structure risk minimization is the minimization of VC bound. To guarantees better generalization performance on test set as from SRM point of view, an algorithm should attain low training error on training set as well as get lower VC dimension. Some traditional techniques (SVM, BP etc.) achieve acceptable performance on training sets but, computation of VC dimension of these algorithms are very cumbersome problem, sometimes it reached to infinity.
In case of ELM, it is very favourable to calculate generalization bound because the VC dimension is equal to the number of hidden neurons with probability one in this case.
Theorem 4 (Liu, Gao, et al., 2012). The VC dimension of ELM with L hidden nodes which are infinitely differentiable in any interval is equal to L with probability one.
Under the SRM framework, ELM is considered as an ideal classification model from collaboration of Theorem 1 (ELM can result in a low training set approximation error) and

Application Area of ELM
ELM contribute its major role in the area especially in classification, regression, feature learning, compression and clustering (as shown in Fig 13). Some specific applications comes in these categories are shown in

Classification
In the field of machine learning, it comes under supervised learning and it incorporates to classify given set of data into classes (as shown in Fig 15).

Fig 15: Classification in machine learning
Some of the applications of ELM in classification area are: Guang-Bin Huang et al. [12] gives an idea of multiclass classification using ELM which is advantageous over LS-SVM and PSVM because these traditional approaches are not directly applied to multiclass classification. In order to be classified in multi-label face recognition applications, Weiwei Zong et al. [13] introduced OAA (one against all) and OAO (one against one) ELM. Wu Jun et al. [14] used ELM in image classification to quickly train positive and negative fuzzy rule systems. Yanpeng Qu et al. [15] used the concept of ELM in mammographic risk analysis. Qi Yuan et al. [16] implemented Epileptic ECG classification based on ELM. Sergio Decherchi et al. [17] performed digital implementation with ELM for classification. Hongming Zhou et al. [18] carried out credit risk evaluation with ELM. Mahesh Pal et al. [19] implemented remote sensing image classification using ELM. For speaker recognition, Yuan Lan et al. [20] proposed an extreme learning machine approach. Deepa et al. [21] implemented classification for brain tumour in 3D MR image with ELM. Wenbin Zheng et al. [22] performed text recognition using ELM. Wenbin Zheng et al. [23] implemented spectroscopy based food classification with ELM. Xiaoxuan Lu et al. [24] applied robust ELM to indoor positioning. Xiaodong Li et al. [25] suggested multiple-kernel-learningbased ELM for classification design. ShuliangXu et al. [26] implemented data stream classification with dynamic ELM. Zaher Mundher et al. [27] carried out implementation of ELM for river flow forecasting. Dibyasundar Das et al. [28] used ELM in handwritten character recognition. Honghao Zhu et al. [29] implemented ELM in credit card fraud detection. ELM concept has been applied by Mohammed Attique et al. [30] in the classification of COVID-19 infections caused by normal Chest CT scans.

Regression
It is also comes under supervised learning and basically adopted for predicting continuous values (as shown in Fig 16).

Fig 16: Types of regression in machine learning
A solution of optimization approximation was given by Yubo Yuan et al. [31] for ELM-based regression problems. F.L. Chen et al. [32] used the concept of gray extreme learning machine in sales forecasting system. Yimin Yang et al. [33] applied bidirectional ELM for regression problem. Guorui Feng et al. [34] used evolutionary selection ELM for regression. Jiuwen Cao et al. [35] implemented self-adaptive evolutionary ELM for these type of problems. Yoan Miche et al. [36] used regularised ELM to solve regression problems with missing data. Parallel ELM for regression based on MapReduce was proposed by Qing He et al. [37]. For regression, Guoqiang Li et al. [38] used the notion of enhanced ELM based on ridge regression. S. Balasundaram et al. [39] applied the Newton method notion to a 1-Norm ELM for regression and multiclass classification. Outlier-robust ELM for regression problems has been recommended by Kai Zhang et al. [40]. Jarley Palmeira et al. [41] developed an online sequential ELM which is Kalman filter-driven, especially for regression problems. For regression and multi-class classification, Li Ying et al. [42] suggested an orthogonal incremental extreme machine. Using an extreme learning machine, Xiong Luo et al. [43] recommended L1-and L2-norm regression and classification. Shen Yuong et al. [44] used an extreme learning machine with constrained optimization for noisy data regression. Tassadaq et al. [45] used ELM in speech enhancement. For regression, Jie Zhang et al. [46] suggested using a residual compensation extreme learning machine. Zaher et al. [47] implemented ELM in predicting compressive strength of lightweight foamed concrete. Yong Shi et al. [48] recommended fast learning machine for ordinal regression. Ahmed et al. [49] used ELM model in water network management. ELM was used by Zhenglei et al. [50] in Modeling colour fading ozonation of reactive-dyed cotton. Using ELM combinations and the support vector regression Turan et al. [51] have used numerous spray coating strategies to detect wear losses in magnesium-coated alloys.

ELM Variants or Proactive approaches of ELM
ELM implemented in various applications and the researchers give various algorithms associated with the concept of ELM and proved that extreme learning machine based techniques are better than conventional approaches. Some of the ELM variant are listed in Table 1. Some of the journal papers detail are listed in Table 2 which includes advantages of technique used in papers and gaps or future work associated with papers.  Manage multi-output multiclass regression and classification problems 3) Faster Convergence 4) Faster than conventional methods 5) Same accuracy -SVM, MLP or GP 6) Small response time -ELM-1)Issues with irrelevant or correlated data -OP-ELM-1)Slower than ELM 2) impact on data set size or problem type
Proposed scheme-1) Cannot distinguish close together pedestrians 2) Certain candidates, such as trees, cars, lamps and streets, are rendered pedestrians

2)
Applied in distributive computing platform.

10.
[51] 2020 SVR, ELM -ELM-Fast and better prediction result MAE and MSE values better 1)Model implementation with other coating technology 2) Benefit to design engineers and design flexibility

Comparison between learning methods
After discussed the various aspects of extreme learning machine, there is need to know about some basic difference between biological learning, conventional learning and ELM. The performance of ELM is higher than regular learning (ex: BP) and very much like biological learning when compared to these three studying methods. As already before discussed the prominent features of ELM which makes it highlighted now a days. Here, in Fig 17, comparison between learning algorithms considering features like stability, parallelization, hardware implementation, involvement of humans, complexity, online sequential learning, speed, accuracy and its model. Fig 18 depicts the distinction between deep learning, from which everyone is familiar and used in most of the applications and the new learning technique, ELM. The comparison is done between these two techniques on the basis of features like amount of involvement of human, sensitivity to the network, suitability for which scale applications, resources requirement, its parallel and hardware implementation, suitability in micro level learning and online incremental learning and complexity related to applications. From this Fig 18, we conclude that ELM is better than deep learning in most of the circumstances and also, ELM gives faster results.  Table [3] provides a comparison of various learning algorithms with ELM. In [6], researcher compared ELM approaches and finds that the proposed algorithm (GSO-ELM) accuracy better than traditional ELM (as shown in Fig 19). Researchers in [9] find that the highest detection rate is achieved by 1200 hidden nodes, random block versions of 200 to 400 and sine activation by means of a faster OS-ELM than all the works in Malayalam's handwritten character recognition (as shown in Fig 20). The investigator in [10] finds that ELM performs better than others on the same dataset implemented in real time to detect blackout risk in comparison with different methods (as shown in Fig 21). In [11], researchers discovered that performance of ELM-AE is better or similar to other comparative methods but ELM-AE takes less training time than others (as shown in Fig 22). The researchers found in [12] that ELM can always deliver a similar performance with much more rapid learning speed than SVM and LS-SVM (as shown in Fig 23). Experiments in [16] revealed that as compared to BP network and SVM classification, ELM is more accurate and consumes much less time (as shown in Fig 24). In [18], by the use of two different dataset researcher conclude that the accuracy of ELM is more as compared to SVM (as shown in Fig 25). The findings in [19] show that the KELM encourages the classifying precision performance of the various data sets used in this study in relation to the SVM. As with the SVM, a probabilistic result is not produced by the KELM. On the other hand, KELM used all pixels compared to the SVM, producing a patchy solution requiring less classification pixel training (as shown in Fig 26).

Comparative analysis of data
In [21], tumor segmentation results when compared with literature shows promising results. In contrast to Vector support, the most widely used ANN for medical classification images, the extreme learning machine has improved grading (as shown in Fig 27). In [23], researchers conclude that on all four datasets, ELM has a relatively large precision (as shown in Fig 28).
Comparisons with the three current LM, CFWNN-ELM and SVM approaches demonstrated in [35] that SaE-ELM can improve network generalization performance effectively (as shown in Fig 29). In [38], scientists demonstrate that RR-ELM has good performance and stability, reducing adverse effects of interruptions and multicollinearities in the linear model in an efficient manner. RR-ELM mainly addresses problems of regression. The experiment shows in [46] that RC-ELM shows good overall performance and robustness, and a generalized framework to address the regression problems can also be suggested by the RC-ELM. The results of the experiment in [56] show that in terms of accuracy, in comparison of original ELM as well as face recognition classifiers and several classification tasks, EN-ELM performance is remarkable (as shown in Fig 30). In , EOSELM efficiency is higher with regards to accuracy, as the results show. (as shown in Fig 31).

Suggested models
After examining Table 3, ELM will be considered as a nice option for machine learning applications like classification, regression, feature learning and act as a good alternative of conventional learning techniques especially in case of accuracy, training time, complexity and tuning of hidden node parameters. There are various variants of ELM available which can be used in various applications. As we have saw in above table [3] and its related figures that ELM performance in most of the applications is outstanding or remarkable, also takes lesser training time as compared to existing approaches, these two main features makes ELM attractive now a days. Besides these, its parallel and hardware implementation is easy because in this we do not need huge amount of resources, easy in micro level learning and online incremental learning, not sensitive to network size, almost free of human intervention and suitable for all the type of applications.
There are four suggested model which may useful in most of the applications and may solved problems associated with it-

Conclusion
In machine learning and data analysis, SVM and NN (especially classification) play a significant role, however they do face some challenge, namely rigorous human involvement, reluctant learning speed and inadequate scalability of learning. These problems are solved by ELM and it does not work for all the kernels i.e. if the kernel satisfies universal approximation then ELM works; also in this learning technique, we do not have to determine the number of hidden layers, learning rate and other hyper-parameters in advance. Some of the most significant benefits of ELM are as follows: Learning possible without iterative tuning for generalised SLFNs; it offers a polarized learning paradigm for regression, classification, feature learning, clustering and compression applications; also incorporates small training error and obtain the smallest norm of weights which results in good generalization performance. ELM works with various hidden nodes, including kernels. This paper covered nearly all related aspects of the new eminent learning technology, the extreme learning machine. More research is needed in the field of ELM so that its pros and cons can be discussed in further papers and use the concept of ELM in most of the applications so that it can take its advantages.

Declarations
Ethics approval and consent to participate Not applicable.