Semi-Markovian approach to optimizing the time of systematic preventive maintenance and the level of maintenance over a nite horizon

This work is concerned with the problem of optimizing maintenance policies in terms of economical rewards and availability. We consider a system with multiple states in terms of healthy mode (good state, degraded state and failure state) and maintenance action (running state, stopped for maintenance). The level of maintenance (perfect or not) is also taken into account. We propose semi-Markovian model highlighting the e ﬀ ects of dwell times and transitions on economical rewards. We determine an optimal policy conditionally upon the current state according to eight decision parameters related to time intervals between two preventive maintenances and the level of maintenance. We show through a sensitive analysis that decision parameters have nonlocal e ﬀ ects that imply a multiple objective function. Hence, we propose a compromise by optimizing the asymptotic average reward.


Introduction
The reliability and availability of equipments constitute a guarantee of competitiveness for any production company (Fedele, 2011;Stapelberg, 2009). The reliability of a production equipment is its ability to perform its mission correctly if it is called in a given time frame (Stapelberg, 2009). Unfortunately, all equipment is subject to degradation mechanisms resulting in various failure modes and possible negative effects on operation (Rémy, 2008). Faced with this situation, maintenance managers need to consider sound maintenance strategies based on reliability (Ren et al., 2017;Werbińska-Wojciechowska, 2019). They may decide to perform corrective maintenance as a result of a hardware failure. This approach remedies or compensates for the problem itself, but it does not avoid direct consequences such as poor product quality and unavailability during maintenance. Preventive maintenance (systematic, conditional or predictive maintenance) is intended to delay and limit the impact of total failure. However, a high frequency of preventive maintenance leads to additional costs and limits the sought availability (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021). It is therefore necessary to choose the right maintenance strategy (Meuwisse et al., 1993). Following (Grabski, 2014), a maintenance strategy is a succession of maintenance policies. A maintenance policy at a given time is expressed as an application from all system states to all maintenance decisions.
Several works exist on the modeling of the reliability and the optimal maintenance of multi-state systems (Gürler and Kaya, 2002;Levitin et al., 2005;Li et al., 2010;Li and Zuo, 2008;Massim et al., 2005;Nourelfath and Ait-Kadi, 2007;Nahas et al., 2008;Tan and Raghavan, 2008;Tian et al., 2009;Vaurio, 1984;Wu and Chen, 1994). Huang and Yuan have studied a preventive maintenance policy for a system with multiple degradation states, under periodic inspection, and with multiple candidate preventive maintenance actions (Huang and Yuan, 2010). The work in (Huang and Yuan, 2010) does not, however, take in account the inspection and maintenance action times (dwell time). Using Poisson processes, the authors of (Wu et al., 2010) have optimized mixed (preventive and corrective) maintenance by conditional imperfect replacement or reparation thresholds related to the level of degradation and the time remaining in the finite life cycle. The levels of degradation are finite and assumed to be irreversible. The model studied in (Soro et al., 2010) is similar to that in (Wu et al., 2010), but some levels of degradation are reversible. No optimization is done there. Gurler and Kaya proposed a maintenance policy similar to that of (Soro et al., 2010), within the framework of a multi-component system (Gürler and Kaya, 2002). Optimization is made on the basis of systematic and perfect preventive maintenance from a number of components beyond a certain degradation threshold. Recently, the authors in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021) approached the optimization of the periodic preventive maintenance interval of a system by taking into account gains/losses due to dwell times in each of the three maintenance states (preventive maintenance, corrective maintenance, operation). Gains or losses related to state transitions are also taken into account. The work in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021) generalizes the notion of system state by combining the state itself with the maintenance action in progress. However, they do not take in account neither the degradation nor the level of maintenance (minimal or imperfect maintenance and perfect maintenance). More recent work by Smithson and Sarkar investigates the optimal number of imperfect repairs that should precede the perfect repair, without consideration of preventive maintenance (Smithson and Sarkar, 2021). This paper has a multiple purpose. Firstly, it aims to propose a semi Markovian model of maintenance highlighting the combined effects of the periodicity of preventive maintenance, the dwell times of the system in its different states, and the choice of maintenance level on the gains/losses. Our second objective is to determine for a finite horizon the optimal values of the parameters aforementioned. Indeed, the optimization of system maintenance over a finite horizon is less discussed in the literature despite its relevance (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021;Wu et al., 2010). The choice of the horizon being rather subjective, and starting from Bellman's principle of dynamic programming, we study the optimality of the maintenance policy at the moving horizon 1 , conditionally upon the current state of the system (Grabski, 2014). The resulting maintenance strategy is therefore stationary.
The rest of the work is organized as follows. In Section 2, we present the global methodology of the work. The abstract model we study is presented in subsection 2.1. We describe the method to study the effects of decision parameters on maintenance policy rewards in subsection 2.2. The objective functions to optimize are presented in subsection 2.3 while the optimization procedure is presented in subsection 2.4. In Section 3 we apply the methodology stated in Section 2 to a case study inspired by (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021). The paper ends with a conclusion in Section 4.
2 Analysis and optimization methodology 2.1 Modeling of a multi-state dynamics Without loss of generality, the abstract system considered in this work is mono-component and has six states. In order to facilitate the understanding necessary for any adaptation or extension, it seems inadequate to consider a very large number of states. The space of states is obtained by combining three system modes (healthy state, degraded, and failure) and three maintenance decisions (putting the system into operation whether it is in healthy or degraded state, minimal repair regardless of the mode, perfect repair in the degraded and failure modes). The result is 9 pairs (mode; decision) of which the following 6 are more relevant: 1. Healthy and functioning system; 2. System in apparently healthy state but stopped for preventive maintenance (inspection or routine maintenance activities); 3. System in a degraded state but in operation; 4. Degraded system and stopped for repair (perfect maintenance); 5. System failure and shutdown for minimal repair (palliative maintenance); 6. Faulty and stopped system for perfect repair or replacement (curative maintenance).
The model is graphically described in Figure 1. The transitions between different states of the system take place in discrete instants whose steps correspond to the dwell times in each state. The dynamics of the system can be described by the matrix of jumps P : In the expression of the matrix of jumps P, τ i , i = 1, . . . , 6 denotes a threshold of time when systematic maintenance actions are carried out. F i , i ∈ {1, 3} is the cumulative distribution function of the dwell time  Figure 1: State transition graph in the state i and F i , i ∈ {2, 4, 5, 6} is the cumulative distribution of the repairing time when the system is in the state i. The parameter τ i , i ∈ {1, 3} refers to the systematic preventive maintenance interval in state i. τ i , i ∈ {4, 5, 6} denotes the threshold time for systematic renewal of the current maintenance action if it has not been completed before. γ is the probability to move from the healthy operating state to the degraded state before the time of systematic preventive maintenance τ 1 while 1 − γ is the probability to move from the healthy operating state to the failure state before the time of systematic preventive maintenance τ 1 . The probabilities α 1 , α 4 ∈ [0, 1] are those of letting the system run in a maintenance-free state. The probabilities α 2 , α 3 , α 5 , α 6 ∈ [0, 1] refer to the choice of the level of maintenance (minimal or perfect). In particular, if α i ∈ {0, 1}, there is a conditional maintenance. All the α i 's are null in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021). The use of cumulative distribution functions F i and the probability γ shows a predictive aspect of the maintenance (γ is 1 in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021)). Hence, the model (1) highlights both corrective and preventive aspects of maintenance. Through the level of repair (minimal or perfect) we can clearly distinguish palliative maintenance from curative maintenance. It is therefore possible to objectively choose the most suitable maintenance policy.

Assessment of the effects of decision parameters
In this section, we present a method for analyzing the effects of decision parameters and their interactions on maintenance policy gains. The aim is to carry out a sensitive analysis that allows us to make a preliminary judgment of the relevance of the decision variables. There are several approaches to sensitive analysis (Berger et al., 2017;Chitnis et al., 2008;Easterling, 2015;Goupy and Creighton, 2006;Marino et al., 2008). One of them is the computation of the normalized index of long-term sensitive introduced in (Chitnis et al., 2008). The latter evaluates the effect of a parameter being close to a reference value. Several indexes for global sensitive analysis are present in (Marino et al., 2008). Examples include Pearson Correlation Coefficient (CC), Spearman Rank Correction Coefficient (RCC), Partial Correction Coefficient (PCC), Standard Correction Coefficient (SCC), of the standard rank correlation coefficient (SRCC), the partial rank correlation coefficient (PRCC) and the extended Fourier analysis sensitive test (eFAST). These indexes are calculated on the basis of several experiments whose planning is of great importance. The use of experiment plans remains one of the simplest and most relevant methods to implement when one does not have a good analytical knowledge of the dependency on parameters. Moreover, it allows, through the codification of values, to avoid the effects of scale in order to highlight the intrinsic effects. We adopt complete experimental designs with two levels (Berger et al., 2017;Easterling, 2015;Goupy and Creighton, 2006) with the responses replaced by rank coefficients. Hence, our approach is a combination of two levels experimental designs with partial rank correlation coefficient analysis.
In the model stated in Section 2.1, there are 8 decision parameters. We code the low levels as −1 and the high levels as +1. This is done through the linear transformation Hence, we consider an experimental matrix having 2 8 = 256 experiments illustrated in Table 1, where ∀x ∈ R, ⌊x⌋ ∈ N and ⌊x⌋ ≤ x < ⌊x⌋ + 1.
The model associated with this experimental study is given by the equation (2): where a 0 denotes the response at the center of experimental domain and a i 1 ...i k are the effects of different interactions of order k between variables x i 1 . . . x i k . The model has 2 8 = 256 coefficients whose relative significances can be evaluated by the Daniel or Pareto methods (Berger et al., 2017;Easterling, 2015;Goupy and Creighton, 2006). The Pareto method we will use is to order the a i 1 ...i k 's by decreasing absolute values, to make an increasing accumulation of these absolute values, and to retain the a i 1 ...i k 's whose accumulation represents a 'significant' fraction of the total accumulation: In (3), S k is the relative effect of the interaction variable x k and CS k is the relative cumulative effect of the interaction variables x 1 , . . . , x k .

Formulation of the optimization problem of gains / losses
In this section, we look at the optimal maintenance policy through an appropriate choice of decision parameters such as τ i and α i . We will use a similar costing approach to the one in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021).
Signed rewards are considered to be losses when they are negative and gains when they are positive. Rewards can be evaluated proportionally to the time spent in one state (R ii ) or relatively to the transition from one state to another (R i j ). Thus, we can consider the matrix of elementary rewards R i j 1≤i, j≤6 . By posing T i j the average dwell time in the state i before transiting to the j state, we can calculate the average transit gain from the i state to the j state by with 6 k=1, P jk 0 P jk T jk denotes the average dwell time in the state j. Contrarily to (Sànchez Herguedas et al., 2020), the term R j j 6 k=1, P jk 0 P jk T jk is considered in order to take in account the average impact of the current decision on the future. For convenience, we could replace • by +∞ in order to materialize the impossibility of the transition. However, to easier the calculation it seems better to replace it by zero. If n designates the jump horizon, and we note J i (n) the accumulation of the average gains after n jumps starting from the state i, we have the recurrence In matrix form, We note that if we neglect the transition costs in the states, J restricts itself to The explicit determination of the analytical expression of J (n + 1) can be done by explicit computation of n k=0 P k when we have access to the Jordan's form associated with P.Alternatively, if the inverse of P is known, we can use the transformation to z and its inverse as in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021). Unfortunately, this is difficult when the P matrix is full and relatively large (from (4 × 4)).

Optimization by experimental designs
As with the evaluation of the effects of decision variables and their interactions, multiple variable optimization of a function whose analytical expression is unknown can be complex. While many nonlocal optimization heuristic algorithms exist, they can be difficult to implement and very computationally expensive for an uncertain result. A detailed discussion of these heuristic methods is given in (Dréo et al., 2006). Optimization by design of experiments generally consists in proposing an empirical quadratic model whose analytical study is relatively easy to carry out (Berger et al., 2017;Easterling, 2015;Goupy and Creighton, 2006). This model is based on a design of experiments for at least three levels (bottom: −1, center: 0 and top: 1).
From the optimization problem (11) given in Section 2.3, we consider the polynomial regression function having 8 + 2 8 = 45 coefficients, To evaluate the coefficients of the polynomial regression model (16) we need at least 45 points. We will consider the complete design of experiments at 3 levels which corresponds to 3 8 = 6561 experiments. Apart from the costs in memory space and high computation time, a large number of points has the disadvantage of letting the local extremes appear in the non-convex case. Highlighting them can make the optimization procedure combinatorial and can limit the application of local optimization techniques in terms of premature convergence (Karmanov, 1977;Luenberger et al., 1984). We will simply keep the points of interest defined by the complete two-level plan completed by the central point having the codification 0.
The Jacobian matrix of the objective function F will be approximated by the matrix of elements The Hessian matrix of the objective function F will be approximated by the symmetrical matrix Thus, if the Hessian matrix is negative definite (all its eigenvalues are negative) then the global maximum of F approached by P D −H −1 F [a 1 , . . . , a 8 ] T , where P D is the projection into the experimental domain D.
To anticipate the difficulties liked to the possible conditioning of the matrix H F , we solve the equation (Fortin, 2001;Fortin, 2011;Jedrzejewski, 2005). It is a question of decomposing −H F into the product LL T where L designates a lower triangular matrix 8 × 8, and of successively solving the equations LY = [a 1 , . . . , a 8 ] T and L T X = Y. L is given by the relations 3 Case study: dump truck diesel engine blower We will reconsider, the example studied in (Sànchez Herguedas et al., 2020). It is a company that, during one phase of a project, uses assets (two-stroke diesel engine, 16 V, 1800 kW) equipped with an air intake fan whose critical and dominant mode of failure is leakage through the blower oil reservoir. There are two levels of leakage severity: the micro-leakage level that does not prevent operation, and the leakage level that requires major leaks to be repaired (minimum repair) or replaced (perfect repair by replacement). It should be noted that it is possible to proceed to a replacement or to plug micro leaks when they occur. It is assumed that after a reliability study, degradation and failure can be represented using Weibull laws of form parameters, of scale and position (σ, λ, µ), and the regression times follow the normal law (censored by the positivity constraint). Concretely, it is a matter of taking where σ i > 0 is the form parameter, λ i > 0 is the scale parameter and µ i ≥ 0 is the position parameter. For i ∈ {1, 3}, σ i > 0 is the dispersion parameter, and µ i > 0 is the position parameter for i ∈ {2, 4, 5, 6}. This choice leads to the average dwell times (in hours) The numerical values adopted for the parameters of the F i laws stated above are specified according to the following Table 2. Based on the considerations in (Sànchez Herguedas et al., 2020), the elementary signed gains R i j are given in euros per hour (R ii ) and in euros (R i j , i j) according to the matrix -1 0 −1 −500 -500 -290 -20 0 −1 −500 -500 −4100 0 3 −1 −500 −500 −4100 0 −2050 −20 −500 −500 −4100 0 −2050 0 −20 −500 -4100 0 −2050 0 −20 -20 The bold cells in the matrix R are those from (Sànchez Herguedas et al., 2020). The rest of the values have been deduced or simply assumed. The authors in (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021) have considered several horizons without giving an objective approach to the choice of horizon. We believe that the maintenance budget is very often associated with the general budget of companies. The latter can be annual or for the duration of a project. It seems reasonable to define a maintenance policy on a horizon relative to these aspects. Considering the hour as a unit of time, an annual horizon corresponds to n = 24 × 365 = 8760 at most. Indeed, the continuous time horizon is proportional to the discrete horizon with respect to the average dwell time. The maximum discrete horizon considered in this example is 15. According to Bellman's principle, the optimization of a n + 1 horizon involves optimizing the n horizon, then optimizing the transition between n and n + 1 (Grabski, 2014). Thus, one can simply consider a moving-horizon optimization of µ n J (1), where µ n designates the initial state probability distribution at horizon n. Figure 2 displays the effects of each decision parameter on J i (1) , i = 1, . . . , 6. J 1 (1) is more sensitive in decreasing order to τ 1 , τ 3 , α 1 and α 2 . τ 1 has a positive effect on the growth of J 1 (1) and that effect is preponderant. Thus, it is not recommended to proceed to the preventive maintenance rapidly when the system is still in good state. The same analysis holds for the negative effect of τ 1 on J 2 (1), J 4 (1) and J 6 (1). τ 3 has a negative indirect effect on the respective growths of J 1 (1) and J 4 (1). Indeed, a rapid maintenance when degradation appears permits to move faster to the good operating state which generates higher positive rewards. The negative effect of τ 3 on J 6 (1) can be interpreted by the fact that late maintenance can lead to a state of total failure that generates significant losses and costs.

Effects of decision parameters
J 3 (1) is more sensitive in decreasing order to τ 1 , τ 3 and α 3 . τ 1 has a negative effect on J 3 (1) contrarily to τ 3 and α 3 . The negative effect of τ 1 can be explained by the fact that a late preventive maintenance may lead to degradation which reduces productivity of the system. The positive effects of τ 3 and α 3 mean that once the system is already degraded, to decide to maintain it generates cost while letting the system to run produces gains. The same explanation holds for the positive effect of α 4 on J 4 (1).
J 5 (1) is more sensitive in decreasing order to τ 1 , τ 3 and α 5 . τ 1 has a negative effect on J 5 (1) contrarily to τ 3 and α 5 . The negative effect of τ 1 can be explained by the fact that a late preventive maintenance may lead to failure which generates costs. The positive effect of τ 3 expresses the fact that a preventive (perfect) maintenance should not occur immediately after a palliative (imperfect) maintenance. The positive effect of α 5 means that palliative (imperfect) maintenance is low cost with respect to curative (perfect) maintenance. The same explanation holds for the positive effect of α 6 on J 6 (1). Figure 3 shows the effects of each decision parameter on the asymptotic average operating times AAOT 1 , AAOT 2 , AAOT and AAR. AAOT 1 and AAOT 2 are more sensitive to τ 1 and τ 3 . Increasing the time interval (τ 1 ) between two preventive maintenances when the system is in the good state allows to have a long operating time in the good state while it reduces the time spent operating in the degraded state. Similarly, increasing the time interval (τ 3 ) between two preventive maintenances when the system is in the degraded The global asymptotic average operating time is sensitive to almost all the decision parameters. The effects of τ 1 , α 3 and α 5 are positive. Thus, short time interval between two preventive maintenances when the system is in the good state is prohibited to increase the availability of the system. On the other hand palliative maintenance is better than curative maintenance, when the system moves from degraded state to failure state or when a palliative maintenance has been already started. The effects of τ 3 , α 1 , α 2 , α 4 and α 6 are negative. Hence, in order to increase the availability, perfect maintenance is recommended when the system moves from a good state to a degraded state or failure state. On the other hand fast preventive maintenance is recommended when the system is already in the degraded state. The preponderant negative effect is the one of α 4 .

Optimal decision policy
Applying the procedure described in section 2.4, we study Hessian matrices of the J i (1)s, the AAOT i 's and AAOT . It follows that J 1 (1), J 2 (1), J 4 (1), J 6 (1) and AAOT 1 are concave while J 3 (1), J 5 (1) and AAOT 2 are convex. AAOT and the associated gain are neither convex nor concave. Thus, the maximums of J 3 (1), J 5 (1), AAOT 2 , AAOT and the associated gain belong to the boundary of the optimization domain. The search of optimal parameter leads to Table 3. Empty cells in Table 3 materializes the fact that certain parameters do not influence the optimum. The bold cells colored in blue correspond to decisions to be really taken at each state. For example, the choices of τ 1 , α 1 and α 2 should be done when the system is in good state.

Conclusion
In this work, we tackled the problem of optimal maintenance under several points of view. Indeed, the maintenance cost can be evaluated step by step or over a given horizon. We were interested by rewards (positive or negative) corresponding to different strategies. Hence, we started by proposing a Semi-Markovian model with 6 states representing both the healthy state and the current maintenance action. Our model considers good, degraded and failure modes. It also takes into account the three states :'running', 'perfect maintenance' and 'imperfect maintenance'. The advantage of a semi-Markovian model is to highlight the effects of dwell times on the maintenance rewards. The exploitation of the model follows two steps : sensitive analysis and optimization.
The models present 8 decision parameters concerning the time intervals between two preventive maintenance and the levels of maintenance. We use the experimental designs method combined with partial rank correlation coefficient analysis to get the effects of decision parameters on different objective functions.
These objective functions are : average rewards conditionally upon a given state of the system, the asymptotic average operating times, and asymptotic average rewards. It follows that all the parameters have a real significance but their effects are not the same for each objective function. Hence, we were facing a multiple objective problem.
The optimization procedure has been carried out using optimization experimental designs that provided us with quadratic models. The data we used came from references (Sànchez Herguedas et al., 2020;Sànchez-Herguedas et al., 2021). A spectral analysis of Hessian matrices permitted us to check the convexity or not of the problem. The optimum was sometimes found at the boundary of the research domain. We finally adopted a compromise solution maximizing the asymptotic average reward. The optimal strategy leads to an asymptotic average reward of 4.699 × 10 19 euros.
This work can be adapted in several similar contexts by increasing if necessary, the number of levels of degradation, the number of levels of maintenance and the number of maintenance actions (Soro et al., 2010;Wu et al., 2010). As result of optimization, it is highlighted that random policies can also be optimal. This enlarges the view on optimization of maintenance strategies.