A deep learning-based conditional system health index method to reduce the uncertainty of remaining useful life prediction

Many recent data-driven studies have used sensor profile data for prognostics and health management (PHM). However, existing data-driven PHM techniques are vulnerable to three types of uncertainty: sensor noise inherent to the sensor profile data, uncertainty regarding the current health status diagnosis caused by monitoring a single health index (HI), and uncertainty in predicting the remaining useful life (RUL), which is affected by unpredictable changes in system operating conditions and the future external environment. This study proposes a deep conditional health index extraction network (DCHIEN) for PHM to effectively manage these three types of uncertainty. DCHIEN is a model that combines a stacked denoising autoencoder that extracts high-level features robust to sensor noise with a feed-forward neural network that produces an HI based on user-defined monitoring conditions. This approach supports system health monitoring using the conditional HI, as well as prognostics using RUL interval predictions. Extensive experiments were conducted using NASA’s turbofan engine degradation dataset. The results show that the proposed method achieves a superior RUL prediction performance compared to state-of-the-art methods and that uncertainties can be effectively managed.


Introduction
Prognostics and health management (PHM) is a systematic approach for diagnosing a system's current health status and predicting the failure time (Zhao et al. 2014;Zhang et al. 2019) based on information from sensors, domain knowledge, and external environmental factors. Effective PHM can significantly reduce system maintenance and operational costs and ensure long-term system stability (Huynh et al. 2014;Zhu et al. 2019). Recently, PHM has played a key role in the stable and cost-efficient operation of ship and aircraft engines (Sun et al. 2015;Wang et al. 2019;Diez-Olivan et al. 2018), lithium-ion batteries (Rezvanizaniani et al. 2014;Waag et al. 2014;Liu et al. 2015), and equipment for various manufacturing processes (Yang and Lee 2012;Yang et al. 2016;Zhao et al. 2017 In the past few years, advances in statistical learning and machine learning have enabled engineers to analyze the vast amounts of sensor profile data provided by multiple sensors in engineering systems (Wijayasekara et al. 2014). The information extracted from these learning techniques helps engineers determine not only the current system's health status but also the expected failure time.
Data-driven PHM approaches are broadly divided into Bayesian and machine learning approaches . In general, the Bayesian approaches transform the sensor profile data into a health index (HI) that represents the system's health status and models HI transitions over time based on certain probability distributions. Because HI transition models estimate the HI at the next time point based on previous HI change patterns, recursive HI estimation can predict the future performance degradation patterns of a system and ultimately predict its remaining useful life (RUL) (Wang and Gao 2015;Lim et al. 2017;Hong et al. 2015;Si et al. 2017).
Machine learning approaches can be broadly categorized into two types. The first type predicts a system's future degradation patterns by extracting the HI at certain time points from the sensor profile data and then learning the change patterns of the extracted HI using machine learning models (Yang et al. 2016). Various models, including autoregression (Qian et al. 2013), support vector regression (SVR) (García Nieto et al. 2015;Tran et al. 2012), and long short-term memory (LSTM) , have been used to learn the change patterns. The second approach directly predicts the RUL without learning the health degradation pattern of a system. Traditionally, logistic regression (Liao et al. 2006), multilayer perceptron (Tian et al. 2010;Tian 2012), and SVR (Benkedjouh et al. 2013) have been used as the RUL prediction models in this approach.
In the past few years, efforts to employ deep learning models have increased. For example, Jiang and Kuo (2017) and Li et al. (2018) used a convolutional neural network (CNN) to predict the RUL of aircraft engines. They showed that CNNs are better than traditional machine learning models in terms of prediction performance. Similarly, another deep learning model, LSTM, was successfully applied to predict the RUL of computer numerical control milling machine cutters (Zhao et al. 2017) and aircraft engines (Wu et al. 2018). Recently, Zhang et al. (2022) proposed a bidirectional gated recurrent unit with a temporal self-attention mechanism to consider the reverse flow of profile data and to reflect the difference in RUL prediction results at different time instances. They suggested that the bidirectional gated recurrent unit, as an improved LSTM, can improve the prediction performance by simplifying the network structure. Deutsch and He (2018) used a deep belief network to develop an RUL prediction model for rotating components such as gears. They showed that when a vast amount of sensor profile data is used in PHM, deep learning models can yield better performance than shallow machine learning models that require prior knowledge of the model structure and signal processing technology. Recently, advanced learning techniques have been introduced to cope with limited failure data in realworld PHM cases. For example, Listou Ellefsen et al. (2019) applied a semi-supervised learning technique to train a new deep neural network (DNN) with insufficient high-quality labeled data. Similarly, Jang and Kim (2021) proposed a Siamese network-based RUL prediction method that utilizes training samples as references to cope with situations in which limited historical data are available.
However, the aforementioned existing data-driven PHM methods are vulnerable to the influence of uncertainties. In this paper, we consider the following three types of uncertainty, which are redefined based on Sankararaman et al. (2013): 1. Noise occurs when sensors measure the system parameters and adds variability to the system health status diagnosis and RUL prediction (Daigle et al. 2014;Javed et al. 2015). Thus, noise-reduced data should be input into the PHM model. 2. System degradation patterns occur for a wide variety of reasons according to the system characteristics. Even the degradation patterns of a single system can vary with its health status (Eker et al. 2012). Thus, the PHM model often does not effectively retrieve information about the system health status (as represented by the HI) from the sensor profile data. Therefore, the PHM model should be able to handle the uncertainty in diagnosing the current health status by providing a variety of HI monitoring results. 3. The task of predicting the RUL is highly uncertain. When predicting the RUL, it is difficult to fully consider the system operating conditions and external environment changes that may occur in future, leading to uncertainty in the prediction results (Sankararaman 2015). Therefore, it is not reliable to model the PHM system assuming that accurate RUL prediction is possible. Rather, the PHM system should be responsive to prediction errors by providing supplemental information.
This paper proposes a PHM method that focuses on managing the three types of uncertainty mentioned above. The proposed PHM method is based on the deep conditional HI extraction network (DCHIEN) presented here. DCHIEN is a model that combines a stacked denoising autoencoder (SDAE) (Vincent et al. 2010), which extracts high-level features that are robust in regard to the noise inherent in the input data, with a feed-forward neural network. The network predicts the HI after considering the user-defined monitoring conditions and the features extracted by the SDAE. When the actual health status of the system is the same as or similar to the user-defined monitoring conditions, DCHIEN properly learns the degradation patterns for those conditions. Therefore, the proposed model allows engineers to set multiple monitoring conditions and monitor the changes in conditional HIs (CHIs) for each set condition. In addition, DCHIEN can be used to estimate the RUL prediction intervals. Therefore, even when the RUL point predictions are poor, engineers can establish conservative maintenance plans based on the RUL prediction intervals.
The main contributions of the paper are as follows: • In this paper, the novel model DCHIEN is proposed to address the three major types of uncertainty. In DCHIEN, a denoising autoencoder is first pretrained to minimize the risk of sensor noise and is used as a feature extractor. Then, we apply a new training method to learn CHIs. • DCHIEN can help engineers manage systems effectively by providing a diverse set of information, including multiple CHIs, RUL point estimates, and RUL prediction intervals.
• We compare the proposed RUL prediction method with state-of-the-art methods, and the comparison results show that the proposed method achieves the best performance. In addition, it is shown that the prediction performance can be significantly improved by considering the points in the very small RUL prediction interval as possible points of failure.
The remainder of this paper is organized as follows: In Sect. 2, related works are reviewed. Section 3 describes the structure and learning method of the DCHIEN model. Section 4 presents the PHM method based on the DCHIEN model. In Sect. 5, the proposed PHM method is evaluated and compared with existing RUL prediction methods. Finally, Sect. 6 describes the conclusions drawn from this study and briefly discusses the limitations of this paper and future research directions.

Uncertainty management in prognostics and health management
There are several sources of uncertainty that affect the current operations and future behaviors of systems (Sankararaman et al. 2013;Sankararaman 2015), so uncertainty management is important for PHM. Therefore, often, it is not meaningful to perform diagnosis and prognosis without considering the presence of uncertainty inherent in the system of interest. There are three major uncertainties described in Sankararaman et al. (2013): state uncertainty, future loading uncertainty, and process noise. First, in most cases, the true state at any instant is not known precisely and is impossible to estimate with certainty due to some uncertainty sources, including sensor measurement noise and model uncertainty. Let s t be the state of a system and η t be the sensor measurement noise at t. Then, the state uncertainty can be defined as where f reveals the current health status of the system of interest andf is a model that approximates f . Second, we cannot predict future loading and environmental conditions that are necessary information for estimating the RUL. This is called future loading uncertainty. In addition, process noise in future cannot be estimated in advance. Let s t:t+τ + η t:t+τ be the set of true states considering process noise, and letŝ t:t+τ be the set of states estimated by a model from t to t + τ . Then, the uncertainty in RUL prediction can be defined as |q(s t:t+τ +η t:t+τ )−q(ŝ t:t+τ )|, where q outputs the health status at the last instant in the input. Not many studies have taken these uncertainties into account in PHM modeling. Only a few studies have proposed estimating the RUL prediction interval and using it to mitigate the effect of the uncertainty in RUL prediction. The main idea is that we can formulate a more robust maintenance strategy by considering all points in the prediction interval as possible failure points. For example, Bressel et al. (2016) estimated probability bounds by repeating extended Kalman filter-based RUL estimation. Similarly, Liao et al. (2018) estimated the prediction interval of the RUL using LSTM and showed that more effective maintenance decisions can be obtained with prediction intervals. Liu et al. (2018) showed that the RUL prediction error can be reduced by considering the uncertainty of random degradation.

Stacked denoising autoencoder
An autoencoder is a simple feed-forward neural network that consists of an input layer, a single hidden layer, and an output layer. The autoencoder is trained to reconstruct the input data in the output layer. Here, the hidden layer learns the nonlinear features to effectively reconstruct the data. A denoising autoencoder (DAE) is an extended autoencoder version, the goal of which is to reconstruct the original input from an input corrupted by random noise, and it is trained to minimize the reconstruction loss. A well-trained DAE can extract features that are robust to the noise inherent in the input data (Vincent et al. 2008;Jang et al. 2019).
An SDAE has a structure that stacks several layers of a DAE; it uses a layer-by-layer training method in which the hidden layers are trained sequentially and individually to solve the gradient vanishing problem (Hochreiter 1998). An SDAE trained using the layer-by-layer training method can extract high-level features that are robust to noise. The SDAE training method is discussed in detail in Vincent et al. (2010).
An SDAE is a high-level feature extractor that learns through unsupervised learning. A DNN is formed by connecting an SDAE to a feed-forward neural network that predicts an output based on the higher-level features extracted by the SDAE. Therefore, the SDAE training process can be considered a pretraining process that initializes the weights of the entire network. Then, the pretrained DNN is trained again through a back-propagation algorithm to find the relationship between the input and output, a process called fine-tuning. The fine-tuning process for DCHIEN training is described in detail in Sect. 3. duced by the SDAE with N L elements, and C be a monitoring condition. Then, the hidden representation of the CHI extractor Z h is computed as shown in (1), where W h j and b h j are the jth weight matrix and bias vector of the CHI extractor, respectively, and [C/Z L ] = [C, z L1 , . . . , z L N L ] T . When Z h is mapped to CHI, h, the sigmoid function is used as the activation of the DCHIEN output node so that the CHI is assigned a value between zero and one, as shown in (2). For simplicity, we define f , the function that produces h through the SDAE and CHI extractors, as shown in (3).
In this paper, the CHI is intended to show the relative value of a given condition C compared to the current health status, where C is the RUL value expected by an engineer. Thus, DCHIEN is trained to produce a CHI of 1 if C is to large compared to the true RUL. On the other hand, a CHI of 0 is produced if the expected RUL (C) is too small. Thus, we set 0.5 as the middle point, at which the true RUL and expected RUL are equal.
The DCHIEN model extracts the CHI, which is information that cannot be provided beforehand. Thus, it requires a fine-tuning method that differs from that of a supervised DNN, in which both the input and output are clearly provided. Let us assume that we are provided with a training dataset that includes the sensor profile data and the RUL values that correspond to each data sample. Specifically, 1 is assigned when C is greater than the RUL, 0 is assigned when C is less than the RUL, and 0.5 is assigned if C and the RUL are the same. By repeatedly learning a randomly assigned C and the resulting changes in the CHI for each iteration, the method automatically learns the degradation pattern for all Cs, where the CHI value approaches 1 or 0 from 0.5 as the true RUL moves farther from C. After training, DCHIEN can be utilized in two ways. First, because this model is trained to output a CHI of 0.5 when the given condition C and the true RUL are equal, we can monitor several degradation patterns according to C. In general, as C increases, degradation occurs earlier because CHI reaches 0.5 at earlier time instances when the true RUL is high. Second, given the same input profile data, DCHIEN produces a higher CHI for a higher C because DCHIEN is more likely to output 1 with a larger C by the training method. This characteristic is used to suggest an RUL prediction interval. The specific fine-tuning method is as follows: 1. The number of training iterations T R, the batch size used in one weight update B, and ρ, which is the ratio of data that have their RUL values as C values, are assigned. ρ is needed because the probability that C and the RUL values are identical is extremely low when C is assigned randomly. 2. B samples that have not been used for training in the current iteration are randomly selected as the training batch. 3. DS higher , DS lower , and DS same are initialized as empty sets φ. 4. Samples are randomly selected from the training batch according to ρ. For each selected sample, the RUL is assigned as its C value and the sample is included in DS same . For the other samples in the batch, C ∼ U ni f orm(1, RU L max ) is assigned. Here, RU L max is the maximum RUL of the system, which is defined by the engineer. Additional details on RU L max are given in Sect. 4. The samples in which the assigned C is larger than the RUL are included in DS higher , those in which C is less than the RUL are included in DS lower , and the remaining samples are included in DS same . 5. If the assigned monitoring condition of sample X i is C i , the batch error E is defined as shown in (4). The weights of DCHIEN are updated through a back-propagation algorithm so that E is minimized.

Proposed PHM method
In this section, the procedure for the proposed PHM method is provided, and Fig. 2 shows an outline. The first step is data preprocessing, which includes min-max normalization, time window processing, and the application of a piecewise linear degradation assumption, which assumes that performance degradation rarely occurs at the beginning of system operation. Subsequently, the preprocessed training dataset is used to train DCHIEN. The trained DCHIEN extracts the CHI from the incoming data and uses it to perform real-time health monitoring and prognostics to predict the RUL point estimate and prediction interval. Detailed descriptions of the data preprocessing method, CHI-based real-time health monitoring, and prognostics are provided in the next subsections.

Data preprocessing
We apply min-max normalization, allowing the system parameter measurement values for each sensor to have the same scale. If the tth measurement value of the system parameter k is called x kt , its normalized value is defined as Here, x max k and x min k are the maximum and minimum values of the measurements of parameter k in the training dataset, respectively. When using the sensor profile data to diagnose the system status and predict the RUL, more information can be obtained by using temporal sequence data than by using the system parameter values measured at a single point in time because temporal sequence data include momentary system parameter information and information on the parameter change patterns over time (Ramasso et al. 2013). Therefore, this study introduces time window processing and uses multivariate temporal sequence data as a single input sample. For example, if the time window size is T W and the number of system parameters is K , the normal- are treated as a single sample, and the time-to-failure at the last time point is the RUL of the sample. Normally, when an engineering system is newly introduced or reintroduced directly after receiving maintenance, almost no performance degradation occurs during system operation; instead, degradation progresses gradually after a certain operational period (Heimes 2008). Therefore, the model is constructed so that during the initial system operation, the RUL remains constant with no degradation. Then, after a certain time, it is assumed that linear degradation occurs, causing a linear reduction in performance (Jiang and Kuo 2017;Li et al. 2018;Zhao et al. 2017;Wu et al. 2018). To adopt this assumption, we set a maximum RUL value RU L max . Samples that have an RUL above RU L max are assigned RU L max .

Health monitoring using CHIs
Even for a single system, the system performance degradation patterns can vary significantly based on system health (Eker et al. 2012). Considering this, it is proposed that the engineer set several monitoring conditions and monitor the CHIs produced by DCHIEN in response to each monitoring condition. DCHIEN's training process results in an extracted CHI that properly reflects the degradation patterns when the system's state of health is poor under low monitoring conditions C; conversely, it also properly reflects the degradation patterns when the system is healthy under a high C. Therefore, when the engineer monitors multiple CHIs, even if the CHI based on a certain C does not properly represent the system degradation pattern, a CHI that is based on another C may be able to represent that pattern. In this way, the engineer can consider a variety of possibilities for the health status, minimizing the uncertainty in diagnosing the current health status.

RUL and interval prediction
DCHIEN is trained to output 0.5 when the monitoring condition C equals the true RUL. Therefore, we can use a C value that produces a CHI of 0.5 because it is considered the most probable estimate of the RUL. RU L i , the predicted value of the RUL for the normalized sample X norm i , is defined as follows: In addition, this paper proposes a method for predicting the interval of the RUL. As a result of the proposed training scheme, ideally, DCHIEN produces a CHI that increases from 0.5 as the input C increases from the true RUL. Conversely, CHI decreases from 0.5 as C decreases from the true RUL. Thus, considering the uncertainty in predicting the RUL, if we find that a C produces a CHI that is higher or lower than 0.5, then Cs can be used as upper or lower bounds of the RUL interval. Let us assume that a deviation of δ with respect to 0.5 is allowed in the CHI. Then, the RUL interval C I δ according to δ is as follows: In practice, 0.01-0.2 is suggested for δ.

Experimental setup
In this study, extensive experiments are conducted using NASA's C-MAPSS turbofan engine degradation dataset . The dataset was obtained by simulating an engine that has a random initial wear level. In each simulation, a fault degrades the performance of the engine, which was initially considered healthy. In the engine, 21 sensors are equipped to measure system parameters such as temperature, pressure, and fan speed during each operation cycle. The C-MAPSS dataset includes four subdatasets collected under different simulation settings: FD001, FD002, FD003, and FD004. They are categorized based on whether the setting has multiple operational conditions and which kinds of faults cause the degradation. A summary of the four subdatasets is given in Table 1. Specifically, FD001 and FD002 are affected by high-pressure compressor faults, while FD003 and FD004 are subject to high-pressure compressor and fan faults. For the training profiles, sensor measurement values from the start of engine operation to the failure point are included. In contrast, the test profiles include sensor measurement values only up to a certain time point before engine failure occurs. Min-max normalization, time window processing, and a piecewise linear degradation assumption, under which RU L max is 125 based on Li et al. (2018), were applied to the datasets. The time window size is set to 30 for FD001 and FD003, 20 for FD002, and 15 for FD004 based on the subdataset information and sensitivity analysis results provided in Li et al. (2018).
To evaluate the proposed RUL prediction method, five performance metrics were used. The first metric is Score, which has commonly been used to measure RUL prediction performance (Listou Ellefsen et al. 2019). Score is defined as follows: where N is the total number of test samples and d i = RU L i − RU L i (predicted RUL-true RUL). The second performance measure is Accuracy, which was proposed in  to evaluate the percentage of correct predictions. Here, a prediction is considered correct if −13 ≤ d i ≤ 10. We also used common prediction performance measures, including the root mean squared error (RMSE), mean absolute error (MAE), and mean percentage absolute error (MAPE), defined in (8), (9), and (10), respectively.
Several model parameters should be set in advance to train the proposed DCHIEN. Thus, to find the optimal model parameters that minimize the RMSE, we used Bayesian optimization, which is an optimal parameter search method based on a Gaussian process (Snoek et al. 2012;Jang et al. 2020). The training profiles of FD001 were used for Bayesian optimization. The model parameters that were found through Bayesian optimization included parameters related to the model structure and training; these model parameters and set values are listed in Table 2. In addition, some of the model parameters were preset to reduce the Bayesian optimization search space. First, the SDAE has two hidden layers, and the CHI extractor has one hidden layer. The training batch size was set to 100, and a tangent hyperbolic activation was used as the activation function of the hidden nodes.

CHI monitoring and RUL prediction results
In general, the RUL is the best abstraction that can show the current health status in a real-world PHM scenario. Thus, the more accurately the HI reveals the RUL change pattern, the better it can be used for monitoring purposes. Figure 3 shows the CHI change patterns according to C for the 24th and 38th test profiles of FD003, which have almost all the data samples until just before system failure occurs. Figure 3 includes the CHI values, the true RUL values for each cycle, and the CHI trend line. The changes in the CHI values were similar to the shape of a sigmoid curve. Thus, the SciPy Python package was used to find the optimal sigmoid trend line that best fit the pattern of CHI change. Figure 3 shows that a single CHI cannot represent the health status for all cycles. However, the CHI can precisely reflect the RUL change pattern in certain intervals. For example, when C was 50, the CHI change pattern was almost the same as the RUL change pattern in cycles in which the true RUL was less than 40. In contrast, when C was 80 and the true RUL values of the cycles were above 80, as shown in Fig. 3(b) and (d), the CHI change patterns were almost the same as the changes in the true RUL. These observations show that engineers can mitigate the uncertainty in diagnosing the current health status by monitoring the CHI changes based on low C values after the engine has been running for a long time and based on high C values when the engine is healthy. Figure 4 shows the RUL point estimate and its prediction interval results for the 24th and 38th test profiles of FD003. At first, the predicted RUL estimate follows the true RUL better than the CHI. However, in some intervals, the predicted RUL does not fit well with the true RUL. For example, when the true RUL is larger than 80 and less than 150, there is a large difference between the true and predicted values for the two profiles in Fig. 4. In this case, engineers can apply the prediction interval, which can cover the true RULs for most cycles, as supplemental information to support robust maintenance decisions. Furthermore, considering that the CHI produced with C = 80 can reflect the RUL change pattern where the true RUL is above 80 in Fig. 3, the CHI can be used as auxiliary information to compensate for errors in RUL prediction.

Ablation study
This section provides an ablation study to analyze the contribution of each component to the RUL prediction performance. FD001 and FD003 were used for the ablation study, and all available samples in the testing profiles were used. In other words, not only the sample just before engine failure but also the other samples were used for testing. For this analysis, we varied the time window size T W to confirm its effect. Specifically, we compared the following baselines: 1. NN: The first baseline model is a neural network that adopts the same network configuration as DCHIEN without a condition node. This model is trained to directly predict the RUL. 2. NN with stacked autoencoder (SAE) pretraining (NN-SAE): In this baseline, the first two hidden layers of the neural network are pretrained using the SAE's layer-bylayer training method. 3. NN with SDAE pretraining (NN-SDAE): Random noise corruption for denoising is added in the layer-by-layer pretraining of NN-SAE. 4. CHI modeling with SAE pretraining (CHI-SAE): Here, the denoising component for sensor noise reduction is excluded from the proposed model. 5. Proposed method: The proposed method can be seen as adding the denoising component to CHI-SAE, adding the CHI modeling component to NN-SDAE, and adding both components to the simple neural network. Learning rate for the 1st hidden layer of the SDAE 9 × 10 −5 Learning rate for the 2nd hidden layer of the SDAE 4 × 10 −3 Fine-tuning learning rate 2 × 10 −4

Test profile #38
Test profile #24 Fig. 3 The CHI change patterns for the 24th test profile of FD003 when a C = 50 and b C = 80, and those for the 38th test profile of FD003 when c C = 50 and d C = 80 The experimental results are given in Tables 3 and 4. The best and second-best results are highlighted in bold and underlined, respectively. First, for all metrics, a large T W contributes to an improvement in performance. However, in practice, there is a limitation on T W , and the contribution becomes marginal as T W increases (Li et al. 2018). Because each evaluation metric has a different characteristic, the pro-posed method could not provide the best performance for all metrics in all experimental settings. However, the two tables show that the proposed method achieved the best or the second-best performance for each combination of T W and dataset type, except in only one case. In addition, CHI-SAE showed the second-best overall performance. The results     To analyze the contribution of each component of DCHIEN in more detail, we calculated the error reduction ratio by comparing the proposed method with CHI-SAE, NN-SDAE, and NN. Table 5 shows the average error reduction ratios of the MSE, MAE, and MAPE. The ablation analysis reveals that each individual component contributes to the improvement in RUL prediction performance, and the best performance is produced when the two components are combined. In particular, CHI modeling not only supports engineers by providing information on RUL prediction intervals and CHI change patterns but also improves the RUL prediction performance by 6.17% on average.

Comparison with state-of-the-art methods
In this section, the proposed method is compared with stateof-the-art RUL prediction methods that harness advanced machine learning techniques to learn degradation patterns or to directly predict the RUL. For a fair comparison, we followed a general evaluation protocol that predicts the RUL at the end point of each testing profile. Tables 6, 7, and 8 show the comparison results in terms of Score, RMSE, and Accuracy, respectively. Accuracy was reported for only two methods in the literature. The method with the best performance for each evaluation criterion is highlighted in bold.
The proposed method achieved the lowest Score and RMSE for two subdatasets, showing the best overall results. Specifically, in terms of Score, RULCLIPPER also achieved the best performance for two subdatasets. However, RUL-CLIPPER was not able to show good results with the other metrics. Similarly, CNN and CNN+LSTM, which obtained the lowest RMSE on one subdataset, performed poorly in terms of Score. In addition, the proposed method achieved the highest Accuracy for all subdatasets. The experimental results reveal that the proposed RUL prediction method can provide the best overall performance, although it may perform slightly worse in a few cases.
Finally, we evaluated the degree to which Accuracy could be increased by considering the RUL prediction interval. If we can effectively increase Accuracy with a small prediction interval, robustness in managing a system can be achieved. That is, fatal system failure can be more effectively prevented by considering all time points in the RUL prediction interval as possible failure points. Table 9 shows the change in Accuracy (%) and the length of the prediction interval according to δ. First, Accuracy significantly increases as the length of the RUL prediction interval increases. More interestingly, it is found that very small prediction intervals can even help prevent system failures. For example, when δ = 0.01, the average Accuracy increased by 2.5% compared to the average Accuracy of the proposed method in Table 8 by considering 0.69 more points as possible failure points. These results show that uncertainty in RUL prediction can be managed well with the proposed method. In terms of Accuracy improvement by length, the table shows that up to a 1.98 Accuracy improvement can be obtained by considering an RUL interval of 1. On average, we can achieve a 1.35 Accuracy improvement only with an RUL interval of 1 if we apply δ = 0.01.

Conclusion and discussion
In this paper, we propose a PHM method that can manage the uncertainty introduced by sensor noise, uncertainty in health status diagnoses, and uncertainty in the RUL predictions. The proposed PHM method is based on the CHI information extracted by the DCHIEN model. The experimental results show that the proposed method can properly monitor the degradation patterns for all intervals, from a healthy engine state to a system failure, by monitoring multiple CHIs.
The proposed RUL prediction method shows the best performance under the largest number of test scenarios. In addition, it is shown that poor RUL predictions can be compensated with a small RUL prediction interval.
We show that the uncertainties in PHM can be effectively managed using DCHIEN. The proposed PHM method can  address many practical issues related to the uncertainties in PHM at the operational level. However, this paper still has the limitation that the uncertainties are considered qualitatively without measuring quantitative uncertainty reduction, despite the definitions of uncertainties suggested in Sect. 2.1. This is because current definitions of uncertainties require the true value of the current health status, which is difficult to know in advance. If uncertainty can be precisely measured, engineers can use this information for more systematic PHM. Thus, this issue should be addressed.
Specifically, the following two topics are considered for follow-up work: First, the proposed model should be extended so that the RUL prediction intervals are based on a well-grounded probabilistic foundation, which will allow more systematic management of uncertainty in RUL predictions. Second, the model presented in this paper cannot quantify uncertainties. Thus, further studies will be conducted to define and quantify uncertainties with more precise  mathematical expressions for more systematic uncertainty management.
Author contributions JJ is the only author of this paper. Therefore, all contributions are made by him. Data availability The datasets generated during and/or analyzed during the current study are available at https://www.kaggle.com/code/ phamvanvung/cmapss.

Conflict of interest
The authors declare that they have no conflicts of interest to this work.
Ethical approval This paper does not deal with any ethical problems.

Informed consent
We declare that all authors have informed consent.