Data Driven Prognostics from Machine Learning to Deep Learning: A survey

doi:10.21203/rs.3.rs-1952441/v1

Download PDF

Research Article

Data Driven Prognostics from Machine Learning to Deep Learning: A survey

https://doi.org/10.21203/rs.3.rs-1952441/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

In the age of Industry 4.0, prognostics and health management (PHM) is critical. For proactive and scheduled maintenance, the life of a tool, part, or component of a system must be tracked. This increases productivity while reducing human effort and saving lives. The PHM application could be used for the detection of anomalies, fault diagnosis and/or failure prognosis. Data driven prognostics is highly relying on machine learning (ML), statistical methods and deep learning (DL) techniques. DL is a massive enlarging field with encouraging outcomes in prognostics for modelling of data with complex representations and temporal dependencies. This paper provides review of some of the latest contributions of ML in prognostic applications. Description of different deep learning architectures is given. A literature review of DL applications in prognostics is presented in an analytical and comparative view. Then a revisit to PHM and deep learning terminologies is then highlighted. Finally, challenges and opportunities are emphasized.

Data Driven Prognostics

Prognostics and Health Management

Health Index

Remaining Useful Life

Machine Learning Prognostics

Deep Learning Prognostics

Deep Learning Architectures

Artificial Neural Network

Feature Extraction

Data Fusion

Performance Metrics

Prognostics and health management (PHM) is an essential part in condition-based maintenance (CBM). Prognostics and health management (PHM) organization held many competitions with various purposes that can be classified as: (1) Anomaly & fault detection, (2) Fault diagnosis, and (3) Prognosis. The term “prognostic” as a definition is “something that foretells”. In the PHM community, a prognosis problem is the estimation of the Remaining Useful Life (RUL) of a component/ system or unit. RUL is the remaining time for the examined part/ machine to reach the failure threshold [1]. Prognostics is a forecast application. Forecast applications can be categorized as continuous forecast (such as weather) and end of life (EOL) forecast that tells when the system is going to fail (such as prognostics).. Prognostics is very vital in manufacturing to increase system life and utilization. The prognostics system helps the decision maker to schedule system maintenance appropriately. This makes the examined system more reliable, safe more usable while lowering total life cycle cost.

Prognostics techniques are categorized mainly into: physics-based, data-driven and hybrid approaches [2]. Physics-based approaches depends on understanding the physics of the system [3–5]. The task here is to develop a physical model for the system under examination. A set of mathematical equations representing modes of failure and degradation trend are developed. Particle and kalman filters are examples of these methods. Though being more accurate and precise whenever developed for a system, these approaches suffer from high cost, time consumption as well as intensive computation [6].

Data-driven approaches require historical data about the system operation along its all life cycle from the beginning to failure with little/no knowledge about system physics. These approaches became more popular in the PHM society because of its quick implementation, readymade tools mainly based on Artificial Intelligence AI techniques.

In this paper, we concentrate on data driven prognostics, its applications, challenges and new horizons.

The paper is organized as follows:

A review of some of the latest contributions of machine learning ML in prognostic applications.
Description of different deep learning DL architectures is given.
A literature review of DL applications in prognostics is presented in an analytical and comparative view.
Revising some terminologies related to DL application in PHM society such as optimization in deep learning algorithms as well as prognostics challenges and performance metrics of the DL prediction algorithm.
Concluding remarks and future work.

Prognostics in PHM applications can be for detecting anomalies in system behavior i.e. fault detection or fault diagnosis including classification and a huge task is remaining useful lifetime RUL estimation of the system/component under consideration. However, RUL estimation is considered the main target for prognostics. These data driven approaches are classified into statistical and AI techniques [7] as shown in Fig. 1. Gaussian process regression (GPR), Bayesian network (BN), hidden Markov model (HMM) and proportional hazards are examples of statistical approaches. AI approaches include machine learning (ML), artificial neural networks (ANN) and deep learning models. [8]. AI approaches learn a model of the degradation process from historical data and utilizes that model to predict future behavior. The next section will focus on reviewing the ML-based approaches along with some of their benefits and limitations.

Machine learning (ML) is one way for learning the behavior of a system to predict its future performance. ML based approaches can be classified as shown in Fig. 2. Supervised learning are Machine learning algorithms that use labeled data to infer a mapping function from the input to the output while unsupervised learning are Machine learning algorithms without labels or specific guidance. Unsupervised learning such as clustering which can be used as a prior step in fault diagnosis PHM applications but is not recognized to be used solely in PHM applications. Classification ML methods as one type of supervised learning are used for fault classification issues in PHM applications while regression ML methods are used for RUL prediction. ML methods are heavily used in prognostics. From the machine learning perspective, prognostics is a regression problem as the target (RUL) is real [10]. As examples of ML prognostics methods, we shall explore Support Vector Machine (SVM) and Tree-based methods as examples for classification methods and Relevance Vector Machine (RVM) that can be considered a regression architecture of SVM. These are of the latest ML methods used in PHM applications including fault detection, fault diagnosis and prognosis.

3.1 SVM and Tree-based methods

SVM models are one of the most popular ML methods. Vapnik was the first to represent SVM in 1998. As one type of supervised learning methods, the idea behind this method is clustering the input vectors. The clustering is done on the label of the target variable by an optimal unique hyper-plane [11]. When SVM is applied for regression tasks, it is named SVR. The use of SVM algorithms to predict the RUL of lithium-ion batteries is a big field [12]. Huang et.al [13] gave a detailed survey on RUL estimation methods based on SVMs. Combining SVM_S with one type of filters, e.g. Kalman filter or particle filter, improves its accuracy of estimation [14]. In this paper, they concentrated on the state of charge estimation problem (SOC) of lithium-ion batteries. This study establishes a multi scale framework with extended Kalman filter (EKF) for accurate parameter estimation. The study contribution is the reduction of the time used to estimate the SOC and capacity, with better accuracy for the capacity. Parameter selections of SVM is an optimization problem, thus later studies combine SVM methods with optimization algorithms like genetic algorithms and particle swarm optimization PSO to improve their prognosis performance. Lin et al. [15] proposed a study for diagnosis of faults for ball-bearings in which a classification problem to determine the type of malfunction of the system that is a proper mission for SVM. In this research, Wavelet packet transform WPT was first applied on the vibration signal. The extracted signals were input to the SVM for model training and testing. The study utilized artificial fish-swarm algorithm AFSA to find the best parameters to classify the malfunction which proved to have better accuracy of the classification process.

RVM is a Bayesian treatment model of SVM that uses less kernel functions, thus having computational efficiency than SVM. RVMs is the version of SVM much used in research recently as will be noticed in the following review. This is because the Bayesian part of RVM is more suitable for regression task i.e RUL estimation problems. RVM has been combined with statistical approaches, deep learning or even physics-based approaches like Kalman filter or particle filter to improve its performance. A prognostics study was done by Wang et al. [16] for RUL estimation of lithium-ion batteries where an RVM is applied to get the relevance vectors then they developed a degradation model for the capacity that has three parameters to meet these representative training vectors. The parameters affecting the data driven prognostics model for battery capacity estimation are studied in [17]. They investigated the effect of data diversity and data size on the prognostics model.

Qin, Xiaoli, et al. [18] introduced a state of health estimation (SOH) of Lithium-ion batteries using RVM. Two monitoring variables are used to constitute the health index HI. Feature Vector Selection (FVS) is used to select the vectors related to HI and RVM was used for RUL prediction that proved higher online prognostic efficiency especially for large scale data. Chang et al. [19] proposed a hybrid method that combines particle filter PF and RVM. The degradation is represented using a group of prediction values that is deterministic then the prediction uncertainty is evaluated using a prediction interval. The experiment is done on lithium-ion battery on different datasets that showed reliable operation with precise uncertainty management. RVM is proved to have better uncertainty management than SVM. The main application eras for SVM are bearings and batteries as have been seen while being rare in other fields. Standard SVM is mostly impossible in terms of computation complexity for large scale data and this may be considered true for other machine learning techniques as we will discuss later.

Tree-based methods are still popular in ML. Random Forests (RF); as an example are considered a hybrid method for learning. It can be used in classification, regression or other tasks. Wu et al. [20] presented a method for prediction of tool wear in milling operations using RF. Then they made a comparative study between different ML algorithms used in tool wear prediction, namely, ANN, SVR and RFs [21]. They held a performance comparison between RFs against feed-forward back propagation (FFBP) ANNs and SVR. This comparison was done on experimental data collected from 315 milling tests. The result was that RFs outperform a single hidden layer FFBP ANNs and SVR in the terms of prediction accuracy.

Despite its capability to interpret model easily and its less data preprocessing requirement, tree-based methods suffer from less prediction accuracy compared to other supervised learning methods. Ensemble methods for tree-based prognostics are becoming the trend recently to solve this issue. Patil et al. [22] used ensemble regression methods using Random Forest and Gradient Boosting, which grow multiple trees, to predict RUL. The FEMTO bearing dataset by the PHM2012 Data Challenge [23] was used. These ensemble regression methods are more accurate in terms of prediction. Cailian et al. [24] introduced a method for life estimation of battery. Genetic algorithm is utilized to optimize the random forest model for prognostics.

In Reinforcement learning, developers build a system for rewarding good actions and penalizing undesirable behaviors. It is a ML strategy that interacts with the environment to optimize the actions. It is a decision-making problem that provides positive values to desired acts and negative values to undesirable behaviors in order to motivate the agent. Many researchers combined reinforcement learning with optimization in a multi-objective optimization problem. The task here is to optimize two or more objectives simultaneously. In [25], an overall framework is proposed using Deep Reinforcement Learning (DRL) to solve multi-objective optimization problems (MOPs). Trying to solve multi-objective travelling salesman problem (MOTSP), the studied framework outperformed earlier solutions of such a problem

The research [26] provides a reinforcement learning-based technique for predicting useful life remaining (RUL). A reinforcement learning agent is used to alter the primary regressor's predictions in order to achieve numerous goals at the same time. The resulting RUL predictions are directly relevant for decision making and do not require any additional interpretation of the prediction model outcomes. In a case study, the suggested technique is validated using the aviation engine run-to-failure simulation dataset C-MAPSS. The trend in using DRL is to benefit from this strategy when we need multi-objective optimization problems.

We can see that ML based approaches are becoming some sort of hybrid methods or followed by an optimization algorithm to improve its performance. On the other hand, deep learning DL can be used instead because of the evolution of Big Data term and Industrial Internet of Things IIOT.

3.2 Artificial Neural network (ANN)

Artificial Neural network ANN is a simulation of human neural system; its structure can be shown in Fig. 3. It can be trained using given input and target output values to simulate the behavior of a system through adjusting weights between layers. It can be used for classification and regression PHM applications. The feed forward neural network (FFNN) is the most preferred architecture in prognostics. Many researchers used a one-hidden-layer in predicting life of bearings [27, 28]. Other researchers combined ANN with filtering schemes to use ensemble of mixed ANN for determining RUL of a system [29, 30]. They proved to have a better score function than earlier methods using ANN. The deep network architecture is the trend nowadays. The reason of this is explained in the next section. Different DL architectures are highlighted next.

3.3 Deep Neural Network (DNN) methods

Deep learning is originally a subset of ML. The evolution of sensor technology and the ‘Big Data’ have imposed challenges to traditional data driven methods. Advances in ANN represented in deep learning provide some way to meet these challenges. The term deep refers to the depth of the network architecture [32]. This deep architecture and the nonlinear activation functions make the deep network applicable to model complex data thus being used in fields with acoustic data, natural language even images in object detection applications. One big advantage of deep learning over ML methods is that it can deal with raw data directly not necessarily using hand-crafted features thus enables end-to-end learning. In addition to that, it gives a one-fits-all framework for all PHM applications. Transfer learning is another advantage of DL models i.e. pretrained network can be reused by a fine-tuning process to solve a new problem thus becoming popular in PHM applications and Big Data fields.

A deep network architecture can be generative or discriminative depending on its operation. This type of classification is given in Fig. 4. A generative model learns the joint probability distribution p (x, y) and a discriminative model learns the conditional probability distribution p (y|x) where x is the input variable space and y is the target one. Generative model may serve to rebuild new states from the underlying data distribution where the only function of discriminative one is mapping from input space to the target. Different deep learning architectures can be summarized in Fig. 5. This figure shows basic architectures of:1. Restricted Boltzmann Machine RBM and its variants, 2 Auto-encoder (AE) and stacked AE (SAE), 3. Convolutional Neural Network (CNN), 4. Recurrent Neural Network (RNN) as being unidirectional and bi-directional, 5. Generative Adversarial Network (GAN).

In the next section, a brief description of different deep learning models will be presented listing some of their application areas in PHM Society.

4.1 Auto-Encoder and stacked Auto-Encoder

Involving two parts: an encoder and a decoder, its input data is compressed by the encoder to hidden layer shrinking in the number of neurons. The decoder tries to reconstruct the input data. The training process requires minimization of average loss of reconstruction. They are mainly used in unsupervised learning. Main features of auto-encoder include: 1- With minimizing the number of neurons of hidden layers, the network must learn representative features of input data for a success reconstruction. 2-Using nonlinear activation functions, such as relu, tanh and sigmoid enables the learning of complex feature representations. 3- Training is done using greedy layer-wise manner.

Stacked auto-encoder (SAE) consists of deep stacked layers of AEs which passes the unseen representation of the down layer as the input to the next layer. Training is due to greedy layer-wise technique. Variants of AE include sparse auto-encoder, denoising auto-encoder, contractive auto-encoder and variational auto-encoder. Auto-encoder and stacked auto-encoder architectures are shown in Fig. 5. However, sparse stacked auto-encoder (SSAE), denoising auto encoder (DAE) are also variants of AE. AE have been heavily used for fault diagnosis for different applications, some of the latest studies are described next.

Shao et al. [33] used a deep AE architecture for fault diagnosis of gearbox and electrical locomotive roller bearings. A new auto-encoder loss function design using correntropy to enhance the feature learning process. This deep auto-encoder design is optimized in parameter using artificial fish swarm algorithm (AFSA). It has proved to be better than earlier methods in terms of accuracy.

Gaussian noise can be used to add a noise part to the input data of DAE, then fed to the hidden layer. Binary noise can also be used. Another fault diagnosis research was done by Meng et al. [34] using a novel DAE for rolling bearing. Trying to overcome limitations of DAE in feature learning especially in the case of non-substantial input data, this study uses a modified AE enhancing norm penalty and an enhanced preprocessing method. Both studies used vibration data only in their investigation neglecting other signals like acoustic emission which is considered important in machinery applications. This can open horizons for future work in fault diagnosis of rolling bearings.

Jiang et al. [35] proposed a sliding window DAE (SW-DAE) algorithm for fault detection of wind turbines. First, the sliding window is applied on multivariate time series data to capture current and previous temporal information and then the DAE model was reconstructed. This study, however, is only applied for fault detection which is a limited view. Other studies looked at fault diagnosis for other applications like wind turbine [36] and for fault diagnosis of solid oxide fuel cell system [37]. We can conclude that AE and its variants are employed in fault diagnosis applications or as a feature extractor in other PHM applications.

4.2 Restricted Boltzmann Machine (RBM)

RBM is a generative stochastic artificial neural network that can learn a probability distribution over its sets of inputs. They have a bipartite graph i.e., visible and hidden units with no intra-layer connections thus being restricted. The training algorithm is gradient-based contrastive divergence algorithm [38]. Acting supervised, RBM is mostly used as a pre-processor for the classification process in other DL-built approaches as well as being a classifier for itself. Variants of RBM include deep belief networks (DBN) and deep Boltzmann machine (DBM).

4.2.1 Deep belief network (DBN)

Stacking multiple RBMs results in a deep belief network DBN. A DBN architecture has no direction concerning the top layer, but the other layers have a top-down direction. The training is done in two phases: pretraining, as unsupervised, using greedy layer-wise down-top manner [39] followed by a fine-tuning phase using back propagation algorithm in an up-down process.

DBN are the first effective deep networks trained and used in PHM applications. Again, optimization algorithms were added to improve performance of earlier studies. Optimization is the process to maximize benefit, i.e minimize error, by selecting the best hyper parameter of the model. Examples of hyper parameters are the number of hidden layers, number of neurons in a single layer and the learning rate. In their study, Shao et al. [40] investigated the rolling element bearings dataset to find the optimal hyper parameter for fault diagnosis purpose. Tang et al. [41] used Nesterov Momentum NM to increase the speed of training the network and enhance its performance. Another fault diagnosis problem considering bearing of traction motor in high-speed trains is investigated using DBN in [42]. The learning rate in this method was adaptive. DBNs may be used as the feature extractor part in the DL model.

Yuan et al. [43] held their research on Wavelet Packet Transform (WPT) features using a pair of different DBNs extracting their features and temporal dependencies. Most of these researches focused on fault diagnosis applications that might be utilized as a prior step for RUL prediction. In [44], Tang et al. proposed a new method for fault diagnosis using a new technique called Fisher discriminative sparse representation (FDSR) in which DBN is also used as a feature extractor. Using dictionary learning gives smaller within-class scatter with greater between-class scatter so the reconstruction error and sparse coefficients were discriminative; a big advantage for this method. In the domain of health assessment, Peng et al. [45] employed DBN in RUL prediction by constituting a health indicator for the degradation process. A particle filter was used for RUL estimation for aircraft engine dataset and was improved using a fuzzy inference system. Some researchers developed an end-to-end which is one merit of DL.

Xie et al. [46] proposed a fault diagnosis model based on an adaptive DBN for the extraction of deep features representing rotational machines to distinguish fault types and degrees of bearing. They employed a DBN with an adaptive learning rate optimized by NM. Comparison was done against SVM and DBN and achieved higher accuracy. Another fault diagnosis research that utilized the ability of DBN to capture higher level representations was presented by Liu, Zhenbao, et al. [47]. In this study, the raw signals output from analog circuits are applied to Gaussian-Bernoulli (GB) DBN to perform fault detection and isolation (FDI) to analog circuits. This process is a multi-class classification process which proved that fault diagnosis based on GB-DBN outperforms earlier methods. DBN can be used in detection of malware of android systems [48], prediction of traffic considering weather factors [49].

We can conclude from this that DBN are mostly used in feature extraction as a pre-phase of the model, for dimensionality reduction which is also one phase of PHM Cycle. It can be used as an ensemble with other DL architectures to perform the prediction process as will be discussed.

4.2.2 Deep Boltzmann Machine (DBM)

Deep Boltzmann machine (DBM) may be seen as a deep RBM with multiple hidden layers where all DBM connections are undirected. The training phase, all the layers are jointly trained using a stochastic maximum likelihood (SML) based algorithm. More information about the training of DBM will be found in [50].

Few studies employed DBM in PHM applications mostly for fault diagnostics applications [51, 52]. Both studies were applied on gear box for diagnosis and fault classification. Hu et al. [52] used an ensemble between DBM and Random Forest (RF) for fault classification to deal with industrial big data. They proposed a collaborative method between DBM and multi-grained scanning forest ensemble. The research was done using Tennessee Eastman Process (TEP) fault diagnosis. It is proved to have a classification accuracy competitive to earlier methods. They employed DBMs to generate 0,1 features which may waste some information which is considered a drawback for this method and may open new era for later research.

4.3 Convolutional Neural Network (CNN)

Convolutional Neural Network (CNN) has proven its success in various applications, including natural language processing NLP, speech recognition and computer vision. Figure 4 shows the architecture of a 2-D CNN with three different parts, i.e., convolutional layer, pooling layer and a number of fully-connected layers. The convolutional layer performs a convolution operation between the inputs and a sliding window (filter or kernel). The output of this layer is the feature map. One merit of the network is the automatic learning of these filters or kernels not being handcrafted.

The convolution layer output is processed through the pooling layer extracting the most important local features. This can cause dimensionality reduction of an intermediate layer, thus avoiding over fitting. Moreover, the dimensionality reduction of the feature map lowers the number of the variables with increasing the shift-invariance property. In [53], they introduced a technique based on CNN to form health indicator (HI) and applied it on prognostics of the rolling bearing system. In a similar way, the DCNN was used by lmiloud et al. [54] to estimate the RUL of rolling bearings. DCNN was used in defect size estimation of bearing [55].

CNNs were basically used to analyze image. Therefore, many researchers proposed methods to preprocess and convert time-series data into 2-D inputs for the system health assessment.

Huang et al. [56] proposed a reshaped time series convolutional neural network (RTSCNN) method based on multisensory raw signal fusion to predict tool wear of CNC machine under milling operations. The raw sensor signals (3D forces, 3D vibrations, and AE ) are collected and reshaped in some form like image. Three convolutional layers and three pooling layers are applied to this reshaped matrix of raw signals to extract highly distinguishing features. A fully connected layer with Relu activation function and a regression layer are added to complete the prediction process of each flute tool wear. The proposed architecture has good performance compared with those which used hand crafted features for both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Experiments were done to determine the best number for training epochs, and the percentage for dropout for training acceleration. Nestrov is proved to better accelerate the training process. They also deployed multi domain feature fusion from three-dimensional cutting force and vibration, constructed them in an input matrix, and applied a deep convolutional network against them to predict tool wear of three cutters of the high-speed CNC machine under milling operations [57]. It outperforms [56] that uses a similar DCNN architecture held on the milling data set but using raw sensor data in the terms of MAPE, RMSE. DCNN was used to directly deal with raw data just normalized requiring no domain expertise for RUL estimation of C-MAPSS dataset set [58]. We can conclude that DCNN is a promising method and can be combined with other DL methods to as a feature extractor part in the prognostics model which can be noticed in the following subsections.

4.4 Recurrent neural network (RNN)

Recurrent Neural Networks (RNNs) acts as a memory cell that saves the status of previous cells thus being the most proper one for the sequential data applications like NLP and those involving time-series data. In the training phase, the hidden unit status is altered using the previous cell status and output between activation function and current input. RNNs are capable of catching long-time and transient dependencies from time series and sequential data, but they have several drawbacks like exploding gradient problem. To overcome such an issue, new versions of RNN were introduced: Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU) as shown in Fig. 6. A gating mechanism is the key feature behind these two architectures. It allows important features of the input to be maintained. Bidirectional LSTM depends on both previous and next states increasing the flexibility and power of RNN making it useful in time-series data applications.

RNNs are effective in computation capability and storage capacity of Big PHM data because of the following characteristics: 1) effective past state storage due to distributed hidden state and 2) hidden states adaptation in complex mapping methods due to non-linear dynamics.

Many researches have investigated the use of LSTM networks to control health of a machinery system. Zheng et al. [59] proposed a Method of RUL prediction using LSTM. Experiments were done on C-MAPSS and milling data sets. It outperforms other methods like CNN and SVR for both RMSE and score function.

Bidirectional LSTM (BiLSTM) network can capture the relationships between sensory data forward and backward achieving the maximum benefit of input. Thus being used by Wang, Jiujian, et al [60] for RUL prediction of turbofan engines. CMAPSS dataset again was used and PHM08 Data Challenge scoring function was computed for evaluation. The data is divided into a training part and testing part. It was compared against previous research which proved better performance in terms of RMSE. Zhao et al. [61] proposed an algorithm that used a local feature based GRU. The algorithm starts with handcrafted features extracted from time series data split into fixed size windows. These features are applied to a bi-directional GRU to capture higher level representations, then a supervised layer takes part of the learning and predicting process. However, they can be criticized for using handcrafted features which require domain expertise. In [62], a health index is constituted using KPCA and exponentially weighted moving average (EWMA) to depict degradation of rolling bearing. This HI is fed to a hierarchical GRU that is built by stacking layers to estimate future HI and predict RUL. They proved to outperform earlier methods. In [63], they combined an enhanced similarity based RUL estimation method with a RNN based autoencoder scheme. They applied the method on turbofan engine data set which gave advantage to the ensemble method suggested.

4.5 Generative adversarial network (GAN)

Generative adversarial networks powerful (GANs) are powerful generative models, were first introduced by Goodfellow [64]. It consists of two parts: generator and discriminator. The distribution of input data is learned by generator of the GAN model while the discriminator playing the adversarial role, has fake and real data as its input and evaluates them for authenticity [65]. The GAN invades all deep learning applications nowadays for improving their performance and prediction capability. Combining auto encoder with GAN results in variational AE (VAE). Many researches have adopted VAE in prognostic applications. It has proved good performance in anomaly detection and RUL prediction tasks.

Yao et. al [66] used VAE to capture dominant features for the unsupervised fault detection applications. Comparison were established using KDD CUP 99 dataset [67] and MNIST [68] dataset. Experiments prove that features extracted by VAE may improve performance for unsupervised anomaly detection techniques. AE and KPCA were used for comparison against VAE. VAE proved to have the best performance among them. Huang et.al [69] used VAE trained with Generative Adversarial Networks (GAN) for long-time prediction of the degradation progression and RUL without specifying a particular failure threshold. .Critical features of degradation are extracted using Monotonic and correlation metrics. Health indicators are constituted and fed with these features to train the model. The VAE consists of an encoder based on bidirectional LSTM and the decoder uses an auto-regressive LST M-GMM. The output of decoder is fed to a fully- connected layer of Gaussian mixture model. Experiments were done on MAPSS, HSSB and Lithium-ion batteries which proved that the adversarial training improves the VAE capability of learning how the degradation process is really distributed which leads to improving prediction accuracy.

Hybrid methods, combining more than one deep learning architectures, are growing in research nowadays to benefit from advantages of the two or more DL architectures used. We will present some of them in the following section. One of the popular hybrid DL techniques is using CNN as a feature extractor and one architecture or more of LSTM for building the model [70–72]. Examples of these combinations are listed in the Table 1.

Table 1

Hybrid deep learning CNN and LSTM
Reference	Type of data	Architecture	Performance evaluation	Data set	Optimization and or regularization
Qinglong An [70]. CNN-SBULSTM	Raw sensory data	3 layers CNN + 2 layers bi-directional LSTMs + 1 layer uni-directional LSTM	score/RMSE/accuracy	Milling dataset PLC controller is used to take readings	Dropout
Zhao et al. [71] CBLSTM	Raw sensory data	one-layer CNN , two-layer bi-directional LSTMs Two fully-connected dense layers The tool wear is predicted using one regression layer that is linear.	MAE, RMSE	Original PHM10 Milling dataset Six sensors only	Dropout
Kong, hengmin,et al.[72]	Preselected data and HI	One-layer convolutional neural network LSTM layers. 2 hidden layers in the fully connected, each has 50 neurons.	MAE, RMSE, MAPE and Score function	CMAPSS Dataset	Adam

The lack of standard evaluation criteria opens the need for standardization to allow comparison in a scientific way. Other hybrid methods combined DBN with DBM as mentioned in [73]. They presented a hybrid of DBNs method. This combines more than one DBN for multi-optimization problem using an evolutionary algorithm. The proposed technique was applied on CMAPSS dataset.

In the next section, we will explore some of the terminologies related to DL and important fields to stand on its current research status.

Approximations in deep learning networks are designed to learn any function. The multi-layered architecture of deep networks makes them able to handle complex non-linearly separable functions. However, the performance of deep learning is highly dependent on model and training factors such as activation function selection, weight initialization, and hyper parameters. . Gradient descent is a first-order optimization algorithm that is used for finding a local minimum of a differential function. The gradient descent technique runs over the training dataset entirely to update all parameters or weights in one step resulting in a slow learning rate. The stochastic gradient descent (SGD) or mini-batch gradient descent [41] is used to solve this problem by updating a sub-set of training samples. With the evolution of genetic algorithms and evolutionary algorithm, the researchers nowadays tend to use multi-objective optimization algorithms to maximize the benefit of the proposed technique [73-75]

Another critical problem in the context of deep learning is to train the models that perform well on both training and test datasets. In this context, regularization is defined as ‘‘Regularizations are techniques used to reduce the error by fitting a function appropriately on the given training set and avoid over fitting”. Common regularization algorithm include dropout, L1 regularization, L2 regularization, data augmentation bagging, Manifold regularization and others [76].

One of the most important challenges to prognostics is uncertainty management because most of the techniques are based on approximations. There are many sources of uncertainty in prognostics. This uncertainty may arise from sensor noise, data processing, filtering, estimation methods, model uncertainty, prediction uncertainty and others. Uncertainty is unavoidable in prognostics applications, cannot be eliminated, but can be handled to be reduced. Accuracy and precision are the basic methods to measure uncertainty [77].

Saxena et.al [78] introduced a new concept for uncertainty management, an error bound called α-bound, is used to find the difference between estimated end of life (EOL) and actual EOL. Performance of prognostics should evolve with time because available data increases. Thus, they adopted four metrics for prognostics in a top down matter that describes the relationship of the performance change with respect to time. The second criterion cannot be tested unless the algorithm passes the first one (Fig 7). The third one is tested after the previous metrics are satisfied. They are called prognostics horizon (PH), α − λ performance, relative accuracy RA, and convergence.

PH is the interval from the first time index the RUL was predicted until EOL. The required performance is bounded by α-bounds under β criterion where β is the minimum acceptable probability mass. PH is determined using equation (4).

Where i_αβis the first time index when predictions satisfy β- criterion for a given α and β is the minimum acceptable probability mass.

α-λ Accuracy is a binary metric that determines whether the prediction accuracy at any time index is inside α-bounds.

Relative accuracy (RA) is the third criterion which measures the difference between predicted RUL and actual RUL at any time index, it is value from 0 to 1. The last metric named convergence is used to measure the rate of improvement for prediction accuracy or precision metric. It is the Euclidian distance between the centroid of area under curve for the specified metric and the origin. RUL estimation is so important to the PHM decision making process.

The amount of uncertainty in RUL estimation informs the decision maker about the percentage of how much he/she can rely on prognostics system results. Handling uncertainty is important in prognostics and decision-making process especially in the fast-growing DL techniques. Uncertainty can now be addressed in DL using Bayesian deep learning (BDL) taking advantage of the Bayesian probability inference. Many researchers have proposed ideas for utilizing BDL to handle uncertainty in DL methods [79, 80]. It is a growing field of research recently.

Data augmentation is one way to solve data scarcity issue encountered in some DL models. It can be used in many applications, image augmentation, audio, NLP and Time-series DA. In [81], A survey of using GAN in DA of medical images. In the data-driven approaches for engineering prognostics, the lack of run-to-fail (RTF) data is data scarcity issue that can be solved using DA. DA can be used along with GAN thus, DA can be applied in sensor application as time-series DA can be found as a survey in [82]. DA can be used in other applications as we ever mentioned that is attracting great attention on research and can be found on [83].

In this study, a review of data driven prognostic methods has been carried out. It can be concluded that DL is attracting attention in data driven prognostics. However, native ML methods are still on spot in some areas of research while being merged with some optimization algorithms such as PSO and gradient boosting to achieve better performance. Different deep learning architectures are combined in many studies to perform better in prediction. This impose some challenges concerning computation demands which can be solved using cloud computing and general processing units (GPUs). Transfer learning and domain adaptation are areas of interest in the field of deep learning and are widening in use in prognostic applications. Some challenges of DL models and their available solutions are listed below:

Data scarcity: small size datasets impose challenge in developing DL-model. Data augmentation is one way for enhancing the size of data by generating synthetic data as we mentioned earlier. Generative algorithms can also be used but difficult to substitute for temporal dependencies in different domains.

Development of smart sensors and IIoT technologies leads to increase of the data scarcity problem with noise and uncertainty thus understanding uncertainty is vital for prognostics modeling. A new field named Bayesian deep learning (BDL) capable of dealing uncertainty came into research. These sensor readings may include incomplete data, unlabeled data, imbalanced classes, and unseen classes. Again, this issue can be solved using data augmentation and generative methods mainly GAN networks but this needs further research.

Hybrid methods between DNN and other methods including optimization is attracting attention nowadays. Stacked architectures of NN are introduced to benefit from advantages of different types. Increasing the size of the NN causes a computation problem which can be solved using novel technologies as GPU and cloud computing. These technologies can speed up the training of DNNs.

Real-time realization is one challenge of DL prognostics. Most of the models are offline and to be realized to be online or onboard add extra challenges on building the DL model. So, comes the need for developing new algorithms and hardware architectures to cope with continuous learning of non-stationary sequence data as well as keeping the pre-trained model.

Evaluation metrics generalization: the performance metrics need to be generalized. Some researchers use MAE, MAPE, and MSE while others use the framework of saxena [78] as we aforementioned. So, comes the need for metrics standardization.

Transfer learning with pre-trained models and domain adaptation. These are evolving eras of research in DL. [84, 85].

Deep learning is still an art not a science yet thus the lack of generalized model. Further development of deep learning theory needs to be made and a better documentation need to be given.

These challenges along with other pre-mentioned issues open eras for future work in prognostics field.

Authors’contributions

All authors contributed to the study conception and design. Material preparation, data and collection were performed by Dina Adel Ibrahim, Mohamed Barawany, Hamdy K. Elminir, Hatem M. Elattar, and Ebrahim Abdel Hamid.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

N. Clements, “Introduction to prognostics,” tutorial, Proc. Annu. Conf. Prognostics Health Manage. Soc., 2011.
Sheppard, John W., Mark A. Kaufman, and Timothy J. Wilmering. "IEEE standards for prognostics and health management." 2008 IEEE AUTOTESTCON. IEEE, 2008.
Kulkarni, Chetan S. A physics-based degradation modeling framework for diagnostic and prognostic studies in electrolytic capacitors. Vanderbilt University, 2013.
Daigle, Matthew J., and Kai Goebel. "A model-based prognostics approach applied to pneumatic valves." International journal of prognostics and health management 2.2 (2011): 84-99.
Fan, Jiajie, Kam Chuen Yung, and Michael Pecht. "Physics-of-failure-based prognostics and health management for high-power white light-emitting diode lighting." IEEE Transactions on device and materials reliability 11.3 (2011): 407-416. DOI: 10.1109/TDMR.2011.2157695
Elattar, Hatem M., Hamdy k. Elminir, and A.M.Riad. “Prognostics: a literature review.” Complex & Intelligent systems 2.2 (2016): 125-154. DOI: 10.1007/s40747-016-0019-3
Y. Peng, M. Dong, and M. Zuo, “Current status of machine prognostics in condition-based maintenance: A review,” Int. J. Adv. Manufacture Technol., vol. 50, pp. 297–313, 2010. DOI 10.1007/s00170-009-2482-0
M. Pecht, Prognostics and Health Management of Electronics. Hoboken, NJ, USA: Wiley, 2008.
Zhang, Liangwei, et al. "A review on deep learning applications in prognostics and health management." IEEE Access 7 (2019): 162415-162438. doi 10.1109/ACCESS.2019.2950985.
Chandola V., Banerjee A., Kumar V.: ‘Anomaly detection: a survey’, ACM Comput. Surv., 2009, 41, (3), pp. 15:1–58.
Su, C., and H. J. Chen. "A review on prognostics approaches for remaining useful life of lithium-ion battery." IOP Conference Series: Earth and Environmental Science. Vol. 93. No. 1. IOP Publishing, 2017
Huang, Hong-Zhong, et al. "Support vector machine based estimation of remaining useful life: current research status and future trends." Journal of Mechanical Science and Technology 29.1 (2015): 151-163. DOI 10.1007/s12206-014-1222-z.
Hu, Chao, Byeng D. Youn, and Jaesik Chung. "A multiscale framework with extended Kalman filter for lithium-ion battery SOC and capacity estimation." Applied Energy 92 (2012): 694-704. doi: 10.1016/j.apenergy.2011.08.002.
Lin, Chih-Jer, et al. "Diagnosis of ball-bearing faults using support vector machine based on the artificial fish-swarm algorithm." Journal of Low Frequency Noise, Vibration and Active Control 39.4 (2020): 954-967. https://doi.org/10.1177/1461348419861822
Wang D, Miao Q, Pecht M (2013) Prognostics of lithium-ion batteries based on relevance vectors and a conditional three-parameter capacity degradation model. J Power sources 239:253–264. Doi : 10.1016/j.jpowsour.2013.03.129
Nagulapati, Vijay Mohan, et al. "Capacity estimation of batteries: Influence of training dataset size and diversity on data driven prognostic models." Reliability Engineering & System Safety 216 (2021): 108048.
Qin, Xiaoli, et al. "Prognostics of remaining useful life for lithium-ion batteries based on a feature vector selection and relevance vector machine approach. " 2017 IEEE International Conference on Prognostics and Health Management (ICPHM). IEEE, 2017, DOI: 10.1109/ICPHM.2017.7998298.
Chang, Yang, and Huajing Fang. "A hybrid prognostic method for system degradation based on particle filter and relevance vector machine." Reliability Engineering & System Safety 186 (2019): 51-63 . https://doi.org/10.1016/j.ress.2019.02.011.
Wu, Dazhong, et al. "Data-driven prognostics using random forests: Prediction of tool wear." International Manufacturing Science and Engineering Conference. Vol. 50749. American Society of Mechanical Engineers, 2017. Doi: 10.1115/MSEC2017-2679
Wu, Dazhong, et al. "A comparative study on machine learning algorithms for smart manufacturing: tool wear prediction using random forests." Journal of Manufacturing Science and Engineering 139.7 (2017). [DOI: 10.1115/1.4036350].
Patil, Sangram, et al. "Remaining useful life (RUL) prediction of rolling element bearing using random forest and gradient boosting technique." ASME international mechanical engineering congress and exposition. Vol. 52187. American Society of Mechanical Engineers, 2018. Doi : 10.1115/IMECE2018-87623.
FEMTO bearing dataset. https://ti.arc.nasa.gov/c/18/
Cailian, L. I. "Life prediction of battery based on random forest optimized by genetic algorithm." 2020 IEEE International Conference on Prognostics and Health Management (ICPHM). IEEE, 2020.
Li, Kaiwen, Tao Zhang, and Rui Wang. "Deep reinforcement learning for multiobjective optimization." IEEE transactions on cybernetics 51.6 (2020): 3103-3114.‏ DOI: 10.1109/TCYB.2020.2977661.
Kozjek, Dominik, and Andreja Malus. "Multi-objective adjustment of remaining useful life predictions based on reinforcement learning." Procedia CIRP 93 (2020): 425-430. ‏DOI: 10.1016/j.procir.2020.03.051.
J. Wu, N. Gebraeel, M. Lawaley, and Y. Yih, “A neural network integrated decision support system for condition-based optimal predictive maintenance policy,” IEEE Trans. Syst. Man Cybern. A, Syst. Humans, vol. 37, no. 2, pp. 226–236, Mar. 2007. DOI: 10.1109/TSMCA.2006.886368.
A. Mahamad, S. Saon, and T. Hyama, “Predicting remaining useful life of rotating machinery based artificial neural network,” Comput. Math.Appl., vol. 60, no. 4, pp. 1078–1087, 2010. https://doi.org/10.1016/j.camwa.2010.03.065.
Peel, Leto. "Data driven prognostics using a Kalman filter ensemble of neural network models." 2008 international conference on prognostics and health management. IEEE, 2008. DOI: 10.1109/PHM.2008.4711423
Lim, Pin, et al. Estimation of remaining useful life based on switching Kalman filter neural network ensemble. Rolls Royce Singapore Singapore Singapore, 2014. DOI https://doi.org/10.36001/phmconf.2014.v6i1.2348 .
Rezaeianjouybari, Behnoush, and Yi Shang. "Deep learning for prognostics and health management: State of the art, challenges, and opportunities." Measurement 163 (2020): 107929. https://doi.org/10.1016/j.measurement.2020.107929.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,'' Nature, vol. 521, no. 7553, pp. 436_444, May 2015.
Shao, Haidong, et al. "A novel deep autoencoder feature learning method for rotating machinery fault diagnosis." Mechanical Systems and Signal Processing 95 (2017): 187-204.‏ http://dx.doi.org/10.1016/j.ymssp.2017.03.034.
Meng, Zong, et al. "An enhancement denoising autoencoder for rolling bearing fault diagnosis." Measurement 130 (2018): 448-454. https://doi.org/10.1016/j.measurement.2018.08.010.
Jiang, Guoqian, et al. "Wind turbine fault detection using a denoising autoencoder with temporal information." IEEE/Asme transactions on mechatronics 23.1 (2017): 89-100.‏ DOI: 10.1109/TMECH.2017.2759301.
Jiang, Guoqian, et al. "Stacked multilevel-denoising autoencoders: A new representation learning approach for wind turbine gearbox fault diagnosis." IEEE Transactions on Instrumentation and Measurement 66.9 (2017): 2391-2402.
Z. Zhang, S. Li, Y. Xiao, Y. Yang, Intelligent simultaneous fault diagnosis for solid oxide fuel cell system based on deep learning, Appl. Energy 233–234 (October) (2019) 930–942, https://doi.org/10.1016/j.apenergy.2018.10.113
Carreira-Perpinan, Miguel A., and Geoffrey Hinton. "On contrastive divergence learning." International workshop on artificial intelligence and statistics. PMLR, 2005.
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, Adv. Neural Inf. Process. Syst. no (2007) 1.
H. Shao, H. Jiang, X. Zhang, M. Niu, Rolling bearing fault diagnosis using an optimization deep belief network, Meas. Sci. Technol. 26 (11) (2015) pp, https://doi.org/10.1088/0957-0233/26/11/115002.
S. Tang, C. Shen, D. Wang, S. Li, W. Huang, Z. Zhu, Adaptive deep feature learning network with Nesterov momentum and its application to rotating machinery fault diagnosis, Neurocomputing 305 (2018) 1–14, https://doi.org/ 10.1016/j.neucom.2018.04.048.
Zou, Yingyong, Yongde Zhang, and Hancheng Mao. "Fault diagnosis on the bearing of traction motor in high-speed trains based on deep learning." Alexandria Engineering Journal 60.1 (2021): 1209-1219.‏ https://doi.org/10.1016/j.aej.2020.10.044.
N. Yuan, W. Yang, B. Kang, S. Xu, C. Li, Signal fusion-based deep fast random forest method for machine health assessment, J. Manuf. Syst. 48 (February) (2018) 1–8, https://doi.org/10.1016/j.jmsy.2018.05.004.
Q. Tang, Y. Chai, J. Qu, H. Ren, Fisher discriminative sparse representation based on DBN for fault diagnosis of complex system, Appl. Sci. 8 (5) (2018) pp, https://doi.org/10.3390/app8050795.
K. Peng, R. Jiao, J. Dong, Y. Pi, A deep belief network based health indicator construction and remaining useful life prediction using improved particle filter, Neurocomputing 361 (2019) 19–28. https://doi.org/10.1016/j.neucom.2019.07.075.
J. Xie, G. Du, C. Shen, N. Chen, L. Chen, Z. Zhu, An end-to-end model based on improved adaptive deep belief network and its application to bearing fault diagnosis, IEEE Access 6 (2018) 63584–63596, https://doi.org/10.1109/ ACCESS.2018.2877447
Z. Liu, Z. Jia, C.M. Vong, S. Bu, J. Han, X. Tang, Capturing high-discriminative fault features for electronics-rich analog system via deep learning, IEEE Trans. Ind. Informatics 13 (3) (2017) 1213–1226, https://doi.org/10.1109/ TII.2017.2690940.
Saif, Dina, S. M. El-Gokhy, and E. Sallam. "Deep Belief Networks-based framework for malware detection in Android systems." Alexandria engineering journal 57.4 (2018): 4049-4057.‏ https://doi.org/10.1016/j.aej.2018.10.008
Bao, Xuexin, et al. "An improved deep belief network for traffic prediction considering weather factors." Alexandria Engineering Journal 60.1 (2021): 413-420. https://doi.org/10.1016/j.aej.2020.09.003‏
Salakhutdinov, Ruslan, and Geoffrey Hinton. "Deep boltzmann machines." Artificial intelligence and statistics. PMLR, 2009.
C. Li, R.V. Sanchez, G. Zurita, M. Cerrada, D. Cabrera, R.E. Vásquez, Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals, Mech. Syst. Signal Process. 76–77 (2016) 283–293, https://doi.org/10.1016/j.ymssp.2016.02.007.
Hu, Guangzheng, et al. "A deep Boltzmann machine and multi-grained scanning forest ensemble collaborative method and its application to industrial fault diagnosis." Computers in Industry 100 (2018): 287-296. doi: 10.1016/j.compind.2018.04.002.
L. Guo, Y. Lei, N. Li, T. Yan, N. Li, Machinery health indicator construction based on convolutional neural networks considering trend burr, Neurocomputing 292 (2018) 142–150, https://doi.org/10.1016/j. neucom.2018.02.083.
D. Belmiloud, T. Benkedjouh, M. Lachi, A. Laggoun, J.P. Dron, Deep convolutional neural networks for Bearings failure prediction and temperature correlation, J. Vibroengineering 20 (8) (2018) 2878–2891, https://doi.org/10.21595/jve.2018.19637.
Kumar, Anil, et al. "Bearing defect size assessment using wavelet transform based Deep Convolutional Neural Network (DCNN)." Alexandria Engineering Journal 59.2 (2020): 999-1012.‏ https://doi.org/10.1016/j.aej.2020.03.034
Huang, Zhiwen, et al. “Tool wear predicting based on multisensory raw signals fusion by reshaped time series convolutional neural network in manufacturing” IEEE Access 7 (2019): 178640-178651. DOI:10.1109/ACCESS.2019.2958330.
Huang, Zhiwen, et al. " Tool wear predicting based on multi-domain feature fusion by deep convolutional neural network in milling operations." Journal of Intelligent Manufacturing (2019):1-14. https://doi.org/10.1007/s10845-019-01488-7.
Li, Xiang, Qian Ding, and Jian-Qiao Sun. "Remaining useful life estimation in prognostics using deep convolution neural networks." Reliability Engineering & System Safety 172 (2018): 1-11. https://doi.org/10.1016/j.ress.2017.11.021.
Zheng,Shuai,et al. “Long Short-Term Memory Network for Remaining Useful Life Estimation.” In IEEE international conference on prognostics and health management (ICPHM). IEEE, 2017. DOI:10.1109/ICPHM.2017.7998311.
Wang, Jiujian, et al. "Remaining useful life estimation in prognostics using deep bidirectional lstm neural network." 2018 Prognostics and system health management conference (PHM-Chongqing). IEEE, 2018. DOI: 10.1109/PHM-Chongqing.2018.00184.
Zhao, Rui, et al. " Machine health monitoring using local feature based gated recurrent unit networks. " IEEE transactions on Industrial Electronnics 65.2 (2017): 1539-1548.DOI:10.1109/TIE.2017.2733438, IEEE
X. Li, H. Jiang, X. Xiong, H. Shao, Rolling bearing health prognosis using a modified health index based hierarchical gated recurrent unit network, Mech. Mach. Theory 133 (2019) 229–249, https://doi.org/10.1016/j. mechmachtheory.2018.11.005.
Yu, Wennian, II Yong Kim, and Chris Mechefske. "An improved similarity-based prognostic algorithm for RUL estimation using an RNN autoencoder scheme." Reliability Engineering & System Safety 199 (2020): 106926. https://doi.org/10.1016/j.ress.2020.106926
I. Goodfellow et al., Generative adversarial nets, Adv. Neural Inf. Process. Syst. (2014) 2672–2680.
Wang, Zhengwei, Qi She, and Tomas E. Ward. "Generative adversarial networks in computer vision: A survey and taxonomy." ACM Computing Surveys (CSUR) 54.2 (2021): 1-38.
Yao, Rong, et al. "Unsupervised anomaly detection using variational auto-encoder based feature extraction." 2019 IEEE International Conference on Prognostics and Health Management (ICPHM). IEEE, 2019.
KDD CUP 99 dataset. https://datahub.io/machine-learning/kddcup99. Last accessed November 2021.
MNIST dataset. https://www.tensorflow.org/datasets/catalog/mnist. Last accessed November 2021
Huang, Yu, Yufei Tang, and James Vanzwieten. "Prognostics with Variational Autoencoder by Generative Adversarial Learning." IEEE Transactions on Industrial Electronics (2021). DOI: 10.1109/TIE.2021.3053882
An, Qinglong, et al. "A data-driven model for milling tool remaining useful life prediction with convolutional and stacked LSTM network." Measurement 154 (2020): 107461. https://doi.org/10.1016/j.measurement.2019.107461
Zhao, Rui, et al. "Learning to monitor machine health with convolutional bi-directional LSTM networks." Sensors 17.2 (2017):273. https://doi.org/10.3390/s17020273.
Kong, Zhengmin, et al. " Convolution and long short-term memory hybrid deep neural networks for remaining useful life prognostics. " Applied Sciences 9.19(2019):4156. https://doi.org/10.3390/app9194156.
Zhang, Chong, et al. "Multiobjective deep belief networks ensemble for remaining useful life estimation in prognostics." IEEE transactions on neural networks and learning systems 28.10(2016): 2306-2318. DOI: 10.1109/TNNLS.2016.2582798.
Ma, Meng, et al. "Ensemble deep learning with multi-objective optimization for prognosis of rotating machinery." ISA transactions 113 (2021): 166-174. https://doi.org/10.1016/j.isatra.2020.09.017.
Zhou, Qingguo, Qingquan Lv, and Gaofeng Zhang. "A combined forecasting system based on modified multi-objective optimization for short-term wind speed and wind power forecasting." Applied Sciences 11.20 (2021): 9383.‏ https://doi.org/10.3390/app11209383.
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
Khawaja, T., G. Vachtsevanos, and B. Wu. "Reasoning about uncertainty in prognosis: a confidence prediction neural network approach." NAFIPS 2005-2005 Annual Meeting of the North American Fuzzy Information Processing Society. IEEE, 2005.doi:10.1109/NAFIPS.2005.1548498
A. Saxena, J. Celaya, B. Saha, S. Saha, K. Goebel, Metrics for offline evaluation of prognostic performance, Int. J. Progn. Heal. Manag. 1 (1) (2010) 4–23.
Wang, Hao, and Dit-Yan Yeung. "Towards Bayesian deep learning: A framework and some existing methods." IEEE Transactions on Knowledge and Data Engineering 28.12 (2016): 3395-3408. DOI: 10.1109/TKDE.2016.2606428.
Peng, Weiwen, Zhi-Sheng Ye, and Nan Chen. "Bayesian deep-learning-based health prognostics toward prognostics uncertainty." IEEE Transactions on Industrial Electronics 67.3 (2019): 2283-2293. DOI 10.1109/TIE.2019.2907440, IEEE.
Chen, Yizhou, et al. "Generative adversarial networks in medical image augmentation: a review." Computers in Biology and Medicine (2022): 105382.‏ https://doi.org/10.1016/j.compbiomed.2022.105382.
Wen, Qingsong, et al. "Time series data augmentation for deep learning: A survey." arXiv preprint arXiv:2002.12478 (2020).‏ https://doi.org/10.48550/arXiv.2002.12478
https://github.com/AgaMiko/data-augmentation-review
da Costa, Paulo Roberto de Oliveira, et al. "Remaining useful lifetime prediction via deep domain adaptation." Reliability Engineering & System Safety 195 (2020): 106682. https://doi.org/10.1016/j.ress.2019.106682.
C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, X. Chen, Deep transfer learning based on sparse autoencoder for remaining useful life prediction of tool in manufacturing, IEEE Trans. Ind. Informatics 15 (4) (2019) 2416–2425, https://doi.org/10.1109/TII.2018.2881543.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Data Driven Prognostics from Machine Learning to Deep Learning: A survey

Status:

Version 1

Abstract

Figures

1. Introduction

2. Data Driven Prognostics

3. Machine Learning Prognostics

3.1 SVM and Tree-based methods

3.2 Artificial Neural network (ANN)

3.3 Deep Neural Network (DNN) methods

4. Deep Leaning Architectures

4.1 Auto-Encoder and stacked Auto-Encoder

4.2 Restricted Boltzmann Machine (RBM)

4.2.1 Deep belief network (DBN)

4.2.2 Deep Boltzmann Machine (DBM)

4.3 Convolutional Neural Network (CNN)

4.4 Recurrent neural network (RNN)

4.5 Generative adversarial network (GAN)

5. Hybrid deep learning techniques

6. Optimization in deep neural networks

7. Performance Metrics

8. DL Prognostics challenges

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1