Remaining useful life prediction based on a modified SVR method with health stage division, cluster sampling and similarity matching

Remaining useful life (RUL) prediction is an advanced methodology of prognostics and health management (PHM), and is in advantage of life cycles management of equipment and maintenance cost reduction. Among the data driven methods, support vector regression (SVR) is one of the most suitable methods when there are limited failure history data for analysis. However, many uncertain factors such as individual variation and time varying operating conditions, will lead that the failure time of all equipment statistically dispersive, and with the increase of sample set, such dispersity may inevitably increase and further reflects in the model training. In consequent, the dispersity may cause two drawbacks. On the one hand, the linearity of SVR model will increase with the increase of sample set, and overfitting or underfitting tends to occur. And on the other hand, a single model only performs well in its generalization and robustness, but may lost its effectiveness that it may fail to work well for a new on-service equipment. In order to deal with the two drawbacks, this paper proposes a modified SVR method with health stage division, cluster sampling and similarity matching. Through health stage division and cluster sampling, each state of the whole degradation process can obtain the optimal parameters, then the irrelevant linearity can be reduced. In addition, since that similar input results similar output, the optimal parameters of the most similar testing sample are also suitable for the on-service equipment, and through similarity matching the most similar testing sample can be obtained, thus the drawback of a single model can be avoided. Finally, the effectiveness of the proposed method is verified systematically by a simulated dataset of fatigue crack growth and a real-world degradation dataset of GaAs-based lasers.


Introduction
The operating reliability of mechanical equipment significantly influences the sustainability and competitiveness of manufacturing industry. As the reliability of mechanical equipment decreases with the extension of the operating time, prognostics and health management (PHM) techniques attract more and more attentions [1][2][3]. It determines the optimal maintenance time, inspection interval, spare parts order quantity and other logistics management strategies through efficient and low-cost diagnostic and prognostic activities to achieve the lowest economic cost or equipment failure risk. The key of PHM technology is to predict the equipment's remaining useful life (RUL), i.e. the remaining useful time for a certain part or component to perform its function before its final failure [4]. The accurate results of RUL prediction can help decision makers to make maintenance plans in advance and optimize supply chain management. Compared with traditional reliability-based methods [5,6], existing RUL prediction methods mainly learn the potential degradation law and the failure information through analyzing the condition monitoring data. They can be generally divided into three categories: physics-based methods, data-driven methods and their combinations [7,8]. A physics-based method always requires a broad understanding of the physical mechanism of the equipment, but an excess of physical parameters make it difficult to get an accurate physical model [9]. In comparison, when the explicit physical model is unknown, data-driven methods can directly analyze the condition monitoring data, and predict the RUL.
Existing data-driven methods can be divided into: (1) the degradation-modeling based methods [8], which aims to model the degradation evolution process of equipment; (2) the machine learning methods [10], which attempts to map the relationship between degradation data and its RUL directly [11].
A degradation-modeling based method mainly uses a statistical model to learn the degradation evolution from the condition monitoring data, and then estimates the RUL by comparing the predicted results with the pre-set failure threshold. Examples such as: Wiener process models [12,13], Gamma process models [14,15], inverse Gaussian process models [16,17], hidden (semi) Markov models (HMM/HSMM) [18,19], etc. The inherent drawbacks of the degradation-modeling based methods derive from two strong premises: (1) the degradation process of equipment's performance should follow a certain statistical model, such as the continuous-time Markov chain, the hidden Markov model, the hidden semi-Markov model or the Wiener process, .etc.; (2) the statistical property of the degradation model, for example the transition probability matrix for Markov-based models, should be a priori known or estimated [20]. However, for practical instances, theoretical statistical models such as Markov chain are very hard to be verified, and the transition probability matrix is often hard to estimate or inaccessible. Therefore, the applicability of the degradation-modeling based methods is limited in engineering practice.
In contrast, the machine learning methods such as artificial neural networks (ANN) [21,22], similarity-based RUL prediction (SbRP) method [11,23] and support vector machine (SVM) [24][25][26][27][28][29][30][31], can effectively avoid these issues. Although the machine learning methods don't provide a probability density function (PDF) estimate of the RUL, they are capable of dealing with prognostic issues of complex systems whose degradation processes are difficult to be interrelated by physics-based methods or degradation-modeling based methods, and have been widely studied and applied in recent years.
In practice, considering the validity and expenditure, the main challenge of PHM is how to perform RUL prediction with the limited amount of failure samples. To the best of our knowledge, support vector regression (SVR) as the most common application of SVM algorithm in PHM field, was originally proposed for small-sample analysis [24], it can effectively estimate the RUL with limited failure samples. Widodo and Yang [25] developed an intelligent system using survival analysis and SVR to predict the survival probability of bearings. Loutas et al. [26] reported the ɛ-SVR, and employed it for estimating the RUL of rolling element bearings. Benkedjouh et al. [27,28] used SVR to map the degradation time series into nonlinear regression, and then fitted it into the power model for RUL prediction of the mechanical equipment. Liu et al. [29,30] proposed an improved probabilistic SVR model to predict the RUL of equipment components of a nuclear power plant. Fumeo et al. [31] developed an online SVR model to predict the RUL of bearings by optimizing the tradeoff between accuracy and computational efficiency. Tao et al. [20] considered the dynamic multi-state operating conditions, and trained the corresponding SVR model under different operating conditions, then the RUL of the on-service equipment can be predicted under time varying operating conditions. Some other SVR-based methods were developed in [32][33][34][35]. These studies promote the application of SVR-based methods in PHM, but due to the dispersity of samples' failure time, there is an inevitable drawback that the linearity of SVR model will increase with the increase of sample set, and overfitting or underfitting tends to occur [9]. Furthermore, such dispersity also represents the diversity of equipment's degradation processes, thus a single SVR model is not always the best one for a brand-new equipment.
Therefore, in order to deal with the two drawbacks, this paper proposes a modified SVR method with health stage division, cluster sampling and similarity matching. As we shall show, the novelties and contributes of this paper exhibit in the following aspects: (1) The reasons of SVR model linearization increasing with data set are analyzed.
(2) It combines health stage division and cluster sampling in SVR model training, and can obtain the optimal parameters of each health stage, thus the drawback that the linearity of SVR model will increase with the increase of sample set can be solved effectively.
(3) It applies similarity matching to select the optimal parameters in the online prediction phase. Based on the idea of similar inputs mean similar outputs, the optimal SVR model for the on-service equipment can be obtained from its most similar testing sample, thus the limitation of a single SVR model can be avoided.
The rest of this paper is organized as below. Section 2 introduces several related theories of the proposed method. Section 3 discusses the problem of a traditional SVR-based RUL prediction method. Section 4 describes the modified SVR method for online RUL prediction. Section 5 analyzes systematically the proposed method by two datasets. Finally, Section 6 concludes the whole paper.

Support vector regression
SVR is the successful extension of SVM for dealing with regression problems [36]. It aims to find a maximum margin in order to minimize the error of the missed training data. Given a training set = { , } =1,2,…, with input data ∈ and output data ∈ , the regression function is expressed as follow: = ( ) = ( ) + (1) where ( ) denotes the feature of inputs, and are the coefficients, and can be estimated by minimizing the following regularized risk function: where the first term 1 2 ‖ ‖ 2 is called the regularized term and used to estimate the flatness of the function, the second term is the so-called empirical error or training error. is the specified criteria chosen to balance the tradeoff between the flatness of the function and the training error. ℓ is the ε linear insensitive loss function: where ε is the tube size of SVR. If the penalty factor is larger, it is proved that the training error is large and the sample penalty of ε is larger, otherwise the opposite is true; if ε is smaller, the error of the regression function is smaller.
Then two positive slack variables and * are introduced to represent the distance from desired values to the corresponding boundary values of the ε-tube. Eq. (2) can be rewritten as: Finally, by introducing Lagrange multipliers and exploiting optimality constrains, Eq. (1) is transformed into the form rewritten as follow: where , * are the Lagrange multipliers which satisfy the equality * = 0, ≥ 0, * ≥ 0 and obtained by maximizing the dual function of Eq. (4) which Karush-Kuhn-Tucker conditions are applied.
The dual Lagrange form of Eq. (4) is given by: with the constrains: where � , � is the kernel function, and its value is equal to the inner product of two vectors and in the feature space ( ) and � �, i.e. � , � = ( ) · � �. Any function satisfied Mercer's condition [24] can be used as the kernel function, such as linear kernel, polynomial kernel, radial basis function (RBF) kernel and sigmoid kernel, etc.

Cluster sampling
In sampling theory, the coherent sample set composed of several connected basic units is called a cluster. And for cluster sampling, the population is first divided into different sub-clusters according to some criteria, then each sub-cluster is taken as the sampling unit that be extracted integrally for analyzation. Due to the advantages of simplicity, convenience and cost-reduction, cluster sampling is widely used for statistical analysis, and developed a variety of variants for different scenarios, such as adaptive cluster sampling [37], stratified cluster sampling [38], etc. The main disadvantages of cluster sampling are that the sample distribution is not wide and the representation for population is relatively poor. However, it does a good job of preserving the integrity of local information. Thus, the method of cluster sampling is employed in this paper to improve the SVR-based method.

Similarity matching
Similarity matching is the key procedure of the SbRP method. As an emerging and universal data-driven method, SbRP has been rapidly studied and developed in recent years. It predicts the RUL of an onservice equipment by a weighted arithmetic mean of those of reference ones [23,39] and is capable to realize long-term estimation and dispense with the degradation modeling [40]. The core idea of the SbRP method is that similar inputs product similar outputs. And based on similarity matching, the most similar trajectories of the reference samples are obtained, then the RUL of the on-service equipment can be calculated by the weighted average RUL of those trajectories [41]. Thus, we consider that the most similar sample's optimal SVR model parameters are also the optimal for the on-service equipment, and employ similarity matching method for parameters selection in the online prediction stage. The most commonly used method of similarity matching is the distance-based algorithms, such as the Euclidian, Manhattan, and Canberra distances.

Problem statement
A large number of experiments and engineering cases show that even the same category or the same batch of equipment, their degradation curves are often different with each other due to the difference of internal structure and the variability of operating environment [42], i.e. the failure time of all equipment are statistically dispersive, reference to Fig. 1. With the increase of sample set, such dispersity may inevitably increase and further reflects in model training. Fig. 1 The degradation curves of the same category of equipment The goal of SVR is to solve the function ( ) that has at most ε deviation from the obtained targets of for all the training data, errors can be ignored if they are less than ε, namely the loss function ℓ doesn't calculate the data points between ( ) + ε and ( ) − ε (according to Eq. 3).
For a trained SVR model, ε is a constant and the ε-tube is unique. However, the uncertainties [43] in the early stages are always more than that in the late stages, thus the dispersity of RUL in the early stages is usually more obvious than that in the late stages, which causes that in the early stages there are more data points outside the ε-tube, and therefore, follow a constant ε may cause that the data points in the late stages have less impact in model training, thus the linearity will increase with the increase of sample set, and overfitting or underfitting tends to occur. For example, Fig. 2 shows the data point distribution of a training data set, they spread out in the early stages and then gradually concentrate, and when using an SVR model to fit them, there is an underfitting for all those data if > . Therefore, the ε-tube may be too narrow in the early stages and too wide in the late stages. Fig. 2 The problem of RUL prediction by SVR method In addition, the diversity of equipment's degradation processes may lead that a single SVR model constructed offline fails to work well for a brand-new equipment.

Methodology
As the problem discussed in Section 3, the dispersity of equipment's failure time leads the constant εtube don't suit for the whole regression process of SVR, health stage division is suit for dealing with this issue. By health stage division, the degradation processes can be divided into several stages according to the different degradation patterns, each stage can be analyzed separately. For instance, Hu et al. [44] divided the degradation processes of generator bearings into four stages based on the change points of confidence levels. Kimotho et al. [45] divided the degradation processes of bearings into five stages by analyzing the changes of frequency amplitudes in the power spectral density. They consider that the degradation process of a product can usually be divided into two or more stages, each stage presents different degradation patterns, and by the multi-stage RUL prediction, the accuracy is improved.
However, many kinds of equipment's degradation processes don't have the obvious change points, and the high-quality failure historical data often less in engineering practice, such issues lead the multi-stage RUL prediction in challenge. For a multi-stage SVR method, if the training sample set is small, there is an inevitably issue that the corresponding SVR model of two consecutive stages may be overfitting or underfitting in their joining part, for the reason that this part is the predicted part of regression. Reference to Fig. 3, the predicted parts are always full of uncertainty, and may be inaccuracy and unsatisfactory. Therefore, cluster sampling is introduced to solve this problem.
The application mode of cluster sampling in the proposed SVR method is shown in Fig. 4. In order to obtain the optimal model parameters of stage k, the corresponding data of stage k of the testing data (such as the data = { , ⋯ , } shown in Fig. 4) are cluster sampled about times and added into the original testing data. Then on the one hand, the weight of one of the data points of increases from 1 to , and where 1 ≤ − + 1 ≤ and 1 = +1 + ≤ ≤ +1 + , thus ≥ 1 is always right, and the partial testing data plays more roles in parameters optimization. On the other hand, other data except greatly decrease the uncertainty in the process of SVR model construction, even though their weights are relatively lower, if these data outside the ε-tube (far from the optimal regression curve), the loss function ℓ will increase. Thus, by introducing the method of cluster sampling, overfitting or underfitting in the joining part of two consecutive stages can be reduced. Fig. 3 The issue of just simply divide the degradation into several stages and construct the corresponding SVR models Fig. 4 The difference of model construction between the proposed SVR method and the traditional SVR method Additionally, in order to refrain from focusing on generalization and neglecting the impact of the diversity of equipment's degradation processes, we introduce similarity matching to select the better parameters in the online prediction stage. The main flowchart of the proposed method is shown in Fig.  5, and described as follows. Step 1: Divide the original data into training set and testing set. As far as possible, the degradation processes of the testing samples should be significantly different from each other. Thus, each testing sample is of representative.
Step 2: Select the appropriate kernel function. The key of SVR is to select the type of kernel function, which mainly includes linear kernel, polynomial kernel, RBF kernel and sigmoid kernel. The most widely used of these functions should be RBF kernel. Whether the small sample set or large sample set, high dimensional or low dimensional, the RBF kernel function is applicable. Compared with other functions, it has the following advantages: (1) The RBF kernel can map a sample to a higher dimensional space, and the linear kernel is a special case of RBF, which means that if you consider using RBF, there is no need to consider the linear kernel. (2) Compared with polynomial kernel function, RBF needs to determine fewer parameters, and the number of kernel function parameters directly affects the complexity of the function. In addition, when the order of the polynomial is relatively high, the element value of the kernel matrix will tend to infinity or infinitesimal, while the RBF is above, which will reduce the difficulty of numerical calculation. (3) For some parameters, RBF and sigmoid kernel have similar performance.
Step 3: Divide the HI of testing sample into stages. Health stage division should follow some criterions, either according to the change points or just evenly divide. Anyway, compared to the entire time series, the internal dispersion of each stage has shrunk.
Step 4: Cluster sampling times for the stage (1 ≤ ≤ ), and put back into the original data. Reference to Fig. 4, the modified testing sample will be used for training parameters for the stage .
Step 5: SVR model training, and get the optimal parameters of the stage . Some algorithms can be used for solve the parameters optimal problem, such as genetic algorithm (GA), particle swarm optimization (PSO), grid search, etc.
Step 6: Repeat step 3, 4, 5 until the whole testing set get the optimal parameters for all the stages.
Step 7: For the HI of the on-service equipment, perform similarity matching with the whole testing set. Similarity matching in this paper mainly focuses on the similarity of degradation curves.
Step 8: Online prediction, select the parameters of the most similar testing sample's corresponding stage for RUL prediction.
Step 9: Finally, output the RUL of the on-service equipment.

Case studies
In this section, we first use a simulated dataset to demonstrate the effectiveness of the proposed method. Then, it is applied to the GaAs-based lasers dataset. The RBF kernel is used in the proposed method and traditional SVR method, and the parameters optimization problem is solved by Grid search method.
where is the radius of the RBF kernel function. In order to demonstrate the performance of the proposed method, we employ mean absolute error (MAE) and root-mean square error (RMSE) as the performance metrics to evaluate the accuracy of the proposed method: represents the total number of data points in the on-service sample's HI, is the accurate value and � represents the predicted value. the lower the or , the better the accuracy. The performance of the proposed method is illustrated by comparing with the traditional SVR method and SbRP method.

Simulation study
In this subsection, the prediction is illustrated and the performance is evaluated through numerical simulations. The degradation curves dataset of 30 units is obtained by an exponential degradation model [46]: ( ) = + + ( ) (12) where ( ) is the log value of the degradation data, ~( 0 , 0 2 ), ~( 1 , 1 2 ) and the noise ( )~(0, 2 ). In this simulation study, all the initial damages start at 0 inch, and the failure threshold (FT) is set to be 50 inches. The preset values of the model parameters are given in Table 1. Fig. 6 shows the degradation curves of the 30 units. The numerical simulations are all terminated at 21 million of cycles. In this study, we choose 3 units as the on-service samples, 8 units as the testing samples with their failure time have relatively balanced distribution, others as the training samples. Online prediction is performed for the three on-service samples by different methods. Table 1 Preset parameters of the exponential degradation model Parameters Preset values 1 0.2 4 0.6 0.06 Fig. 6 The degradation curves of the simulation dataset For the 8 testing units, we divide their respective degradation processes into 5 stages as the interval of each stage is 10 inches, i.e. =5. Then the optimal number of cluster sampling can be analyzed from repeated experiments. When =0, we define the training error of the corresponding stage as a standard unit, i.e. define ( = 0) = 1, then the variation trends of the training error with the cluster sampling number are shown clearly in Fig. 7. Obviously, as the increasing, training errors are nonincreasing, and finally converges at a definite level. Therefore, according to Fig. 7, =25 is an appropriate value for cluster sampling in this study.
In the online prediction phase, the prediction results of the three on-service samples are shown in Fig. 8. For the 1st on-service sample, 8 predictions were made in the whole life cycles, the proposed method has 5 prediction errors with the minimum (62.5%), while traditional SVR method has 0 (0%) and SbRP method has 3 (37.5%), results are listed in Table 2. And for the 1st on-service sample, the MAE and RMSE of the proposed method are all lower than the other two methods. For the 2nd onservice sample, 12 predictions were made in the whole life cycles, the proposed method has 3 prediction errors with the minimum (25%), while traditional SVR method has 1 (8.33%) and SbRP method has 8 (66.67%), results are listed in Table 3. And for the 2nd on-service sample, the MAE and RMSE of the proposed method are all lower than traditional SVR method, but higher than SbRP method. For the 3rd on-service sample, 15 predictions were made in the whole life cycles, the proposed method has 7 prediction errors with the minimum (46.67%), while traditional SVR method has 7 (46.67%) and SbRP method has 5 (33.33%), results are listed in Table 4. And for the 3rd on-service sample, the MAE and RMSE of the proposed method are all lower than traditional SVR method, but the RMSE of the proposed method is higher than SbRP method while its MAE is lower than SbRP method.
Results show that the proposed method is of the better performance than traditional SVR method, but the pros and cons between the proposed method and SbRP method are unclear, and need further analysis in the next subsection.

Application to GaAs-based semiconductor lasers
In this subsection, the proposed method is applied to a degradation dataset of GaAs laser devices, which is taken from Meeker and Escobar (1998) [47]. Degradation of the laser device is measured as the operation current, which increases over time. Degradation data from 15 testing units are collected over time with measure frequency 250 h and terminal time 4000 h, their curves are shown in Fig. 9. In this paper, a laser is assumed to have failed if the percentage of its operating current exceeds a predefined threshold level FT=6%. Note that the observation time points are recorded at every 250 h, this leads to the inability to know the true life, i.e. the time arriving the failure threshold. Due to the reasonable degradation process of laser is an approximate linear degradation, the most possible failure time considered as the real one can be calculated by local linear regression method. We choose unit 5, 6, 7 and 13 as the testing samples and unit 1 and 15 as the on-service samples, while others as the training samples. Online prediction is performed for the two on-service samples by different methods. Fig. 9 The degradation curves of GaAs-based laser data For the 4 testing units, we divide their respective degradation processes into 6 stages equally, i.e. =6. Then the variation trends of the training error with the cluster sampling number are shown clearly in Fig. 10. Obviously, as the increasing, training errors are non-increasing, and finally converges to a definite level. Therefore, according to Fig. 10, =25 is an appropriate value for cluster sampling in this study.
In the online prediction phase, the prediction results of the three on-service samples are shown in Fig. 11. For unit 1, 6 predictions were made in the whole life cycles, the proposed method has 4 prediction errors with the minimum (66.67%), while traditional SVR method has 2 (33.33%) and SbRP method has 0 (0%), results are listed in Table 5. And for unit 1, the MAE and RMSE of the proposed method are all lower than the other two methods. For unit 15, 11 predictions were made in the whole life cycles, the proposed method has 7 prediction errors with the minimum (63.64%), while traditional SVR method has 2 (18.18%) and SbRP method has 2 (18.18%), results are listed in Table 6. And for unit 15, the MAE and RMSE of the proposed method are all lower than the other two methods. Fig. 10 The variation trend of the error in each stage of the testing samples Fig. 11 The prediction results of the on-service samples of GaAs-based lasers data

Results and discussion
From the above two subsections, the proposed method all performs better than traditional SVR method, the introduction of cluster sampling in SVR method can effectively reduce the training error. But we also have to be aware of that as the number of training samples increase, the proposed method sometimes performs not as good as SbRP method. Nonetheless, when the training samples are limited, the proposed method performs better than SbRP method.

Conclusion
This paper proposes a modified SVR method with health stage division, cluster sampling and similarity matching. Through health stage division and cluster sampling, the training error of each stage is nonincreasing, and the whole training error is decreasing with the number of cluster sampling, until it finally converges to a definite level. Therefore, the drawback that the linearity of SVR model will increase with the increase of sample set can be solved effectively. Additionally, similarity matching is introduced to select the optimal parameters in the online prediction phase, it solves the limitation of a single SVR model. To demonstrate the advantages of the proposed method, a simulation dataset and the GaAs-based lasers dataset are applied to analyze. Results show that the proposed method is of better effectiveness than the traditional SVR method, and when the training samples are limited, the proposed method performs better than SbRP method. Despite the prediction results by SbRP method are sometimes more accurate than the proposed method, pointwise similarity measure as the key procedure of SbRP method often require extensive computation, and with the training samples increasing, it is much more time consuming than the proposed method. Consequently, the proposed method is of great significant, it extends the studies of SVR methods for RUL prediction, and is of effectiveness in engineering practice.