Accelerated life reliability evaluation of grating ruler for CNC machine tools based on competing risk model and incomplete data

Grating rulers are high-precision linear displacement sensors used in the servo control system of CNC machine tools. Due to the long lifetime of these rulers, evaluating their reliability level by using traditional methods takes huge amounts of time and resources. Meanwhile, some incomplete data are observed in the failure data, which will cause inaccurate evaluation results. In this study, an accelerated life reliability evaluation method of a grating ruler based on a competing risk model and incomplete data is proposed. Firstly, according to the characteristics of the grating ruler, the life distribution model based on competitive risk is established. The representation and conversion methods of failure data, considering incomplete data, are also proposed. Secondly, the accelerated life test (ALT) system is set up, and the 4500 h step stress ALT is conducted. Results indicate that the reliability level of the grating ruler is MTBFt = 8864.12 h, and that of each subsystem is MTBFt(MS) = 13,351.98 h, MTBFt(OS) = 14,283.48 h and MTBFt(ES) = 17,038.65 h. The advantages of the proposed method are that it is helpful in determining the subsystem with the highest failure rate in each stage and puts forward targeted suggestions for subsequent improvement designs.


Introduction
Grating rulers are high-precision linear displacement sensors, which integrate mechanical, optical and electronic technologies that are widely used in all kinds of CNC machine tools [1][2][3]. Grating rulers can measure high-precision displacement and can quickly feedback the precise position of the feed shaft to the CNC system. These rulers are an important part of the closed-loop servo control system of CNC machine tools and an important factor affecting the machining accuracy of CNC machine tools [4,5]. Therefore, accurately evaluating the reliability level of grating rulers for improving the machining accuracy of CNC machine tools is of great significance [6][7][8]. Engineers in manufacturing enterprises usually conduct life test on grating rulers. The traditional life test aims to drive the reading head of grating rulers to run continuously at a certain speed until failure occurs [9].
However, the life of grating rulers can usually reach more than 10,000 h, and the traditional life test consumes huge amounts of time and resources. By performing an accelerated life test (ALT) to quickly make grating ruler failure and establishing the reliability model of the accelerated life process, the reliability level of grating rulers can be quickly evaluated [10,11].
ALT was firstly proposed by Levenbach in 1957. Its goal is to stimulate the potential failure quickly by improving the stress levels of products and use the acceleration model to convert the failure data under each stress level into the failure data under normal stress [12]. ALT can be used to evaluate the reliability of products accurately and efficiently, especially for products with many kinds of stress and complex failure mechanism. After obtaining the failure data, determining how to establish an accurate reliability model according to the characteristics of products is the key problem [13][14][15][16]. To solve this problem, many scholars have conducted a series of research. Lee et al. analysed a failure mode and failure mechanism on the basis of field failure and performed the ALT of the tripod shaft installed on the power bogies of a high-speed train. The torsional fatigue life of the tripod shaft was verified by the failure data under three stress levels, and the shape and scale parameters of the Weibull distribution model were obtained [17]. Dang et al. conducted ALT for standard and reduced-wall high-stress MV TRXLPE cables with different insulation thicknesses; a comprehensive statistical analysis based on the maximum likelihood method (MLE) was also applied to reveal the life expectancy of reduced-wall high-stress TRXLPE cables [18]. Kalaiselvan et al. performed ALT for C0G and X7R nano ceramic capacitors. A nonparametric method was used to convert the accelerated condition data into an actual condition. Finally, the life of the nano ceramic capacitor under normal stress was calculated [19]. Rodriguez-Picon et al. presented different models for the reliability inferences of devices affected by more than one accelerating variable and established the general log-linear relationship model. Finally, an example was presented and applied to the resistances [20]. Liang et al. studied the influence of the change in the shape parameters of the Weibull distribution on the efficiency of step-down-stress ALT. The results showed that when shape parameters are relatively small, a small number of failure samples can be chosen, which can help shorten the test duration [21]. The above studies are focused on single life distribution models, such as the Weibull and exponential distributions. However, the structures of grating rulers are complex, and many failure locations and failure modes exist. Although single life distribution models can evaluate the overall reliability level of grating rulers, these models cannot describe the life distribution of each subsystem. The essence of the competing risk model is the cumulative failure model of the series system, which has higher accuracy than single distribution models in describing the life distribution of complex products [22][23][24][25]. Zhang et al. proposed to adopt an adaptive progressively hybrid censoring scheme in constant-stress ALT and established a dependent competing risk model by using bivariate Birnbaum-Saunders distribution [26]. Yang et al. proposed a reliability evaluation method of the motorised spindle on the basis of the competitive risk model, which divided failures into catastrophic type or degradation type. The multivariate distribution was constructed using the copula function, and the correlation amongst failure data was determined by performing a hypothesis test [27]. Zhang et al. presented reliability and maintenance models for systems subject to multiple dependent competing failure processes with a changing, dependent failure threshold [28]. Grating rulers are optical, mechanical and electrical integrative products. The performance degradation of some parts in each subsystem will lead to a decline in the accuracy of grating rulers, such as bearing and spring in the mechanical system (MS), grating glass and light-emitting diodes in the optical system (OS) and IC circuits in the electronic system (ES) [29]. Therefore, when the accuracy of grating rulers exceeds the limit, accurately determining the failure location is difficult, suggesting that these failure data belong to incomplete data. Incomplete data will lead to inaccurate reliability evaluation results.
In this study, a reliability evaluation method of a grating ruler accelerated life process based on a competing risk model and incomplete data is proposed. The proposed method not only considers the influence of incomplete data on the evaluation results but also describes the life distribution of each subsystem, which is helpful in putting forward some suggestions for the subsequent improvement designs of grating rulers.
The rest of this paper is organised as follows. In Section 2, the selected grating ruler is divided into three subsystems, and the sensitive stress is determined. In Section 3, the life distribution model of the grating ruler based on competitive risk is established. The representation and conversion methods of failure data, considering incomplete data, are also proposed. In Section 4, an ALT, which can apply temperature and humidity stress, is set up. A step stress ALT of the JFT series closed absolute grating ruler is performed, and its reliability level is evaluated using the proposed method. In Section 5, the proposed method is compared with the Weibull distribution model to verify the effectiveness. In Section 6, the study is summarised.

Characteristic analysis
Taking the HEIDENHAIN LC115 enclosed grating ruler as an example, its structure consists of ruler shell, grating, seal, reading head and other parts, as shown in Fig. 1. According to the principle of functional independence and structure independence, the grating ruler can be divided into the MS, OS and ES. The specific division details are presented in Table 1.
The MS mainly plays the roles of positioning and protection. The grating ruler is fixed on the mounting surface of CNC machine tools through the screw hole on the ruler shell. The ruler shell and the seal play a protective role together, which can effectively prevent cutting fluid, chips and dust from entering the interior of the grating ruler. The MS is mainly subjected to velocity stress and temperature stress. Its main failure modes include bearing jamming, declining spring performance and ageing seal. The main reason for the bearing jam and the decrease in spring performance is the excessive shock generated by the grating ruler when running at high speed. The main reason for the ageing of the seal is the thermal-oxidative ageing of the rubber caused by temperature stress, which intensifies the wear of the seal when the reading head is cyclically moved.
Moiré fringe is generated when the main grating and the indicating grating in the OS move relative to each other. The photosensitive element firstly converts Moiré fringe into electrical signals, and then the displacement of the indicating grating relative to the main ruler grating can be obtained after filtering the signals through the ES. The OS and ES are mainly affected by temperature stress and humidity stress. Under the combined action of temperature and humidity, water vapour will be produced. When the seal is aged to a certain extent, water vapour will bring pollutants into the ulnar shell, which will cause the wear of grating glass in the OS. At the same time, water vapour will also enter the chips with slight manufacturing defects, thus causing the performance degradation of electronic components.
To sum up, temperature and humidity are the main stresses that cause grating ruler failures. In this research, temperature and humidity are selected as accelerated stress and velocity as constant stress.

Life distribution model
The following are the five assumptions in the data analysis process of ALT [30]. Considering that the competing risk model contains the reliability functions of three subsystems, the subsystem with the highest failure rate of the grating ruler can be observed through the competing risk model, which is convenient for putting forward targeted opinions on the improvement design of the grating ruler. Assuming that x 1 , x 2 and x 3 respectively represent the theoretical life of the MS, OS and ES, the overall life of the grating ruler can be expressed as [22] T ¼ min According to Eq. (1), the cumulative failure function of the grating ruler can be determined as According to Hypothesis 5, the complete failure of the product is only caused by the failure of one subsystem, and the failure time of each subsystem is independent of one another; Eq. (2) can be rewritten as In general, the life distribution of the MS conforms to the Weibull distribution. The OS belongs to the optoelectronic system, and its life distribution also conforms to the Weibull distribution. The life distribution of the ES usually conforms to the exponential distribution model; Eq. (3) can be rewritten as where m (MS) and η 1 (MS) are the shape parameters and characteristic lives of the Weibull distribution, respectively. m (OS) and η 1 (OS) are the shape parameters and characteristic lives of the Weibull distribution, respectively. λ 1 is the characteristic life of the exponential life distribution.

Failure data representation method
The essence of the competing risk model is the cumulative failure model of the series system. Thus, determining the failure location is necessary. However, the performance degradation of some components in each subsystem of the grating ruler will lead to the decline of its accuracy. Therefore, when the accuracy of the grating ruler exceeds the limit, accurately determining the failure location is difficult. These failure data belong to incomplete data. To solve this problem, a traceability variable is added to describe the failure location, and the failure data are specifically expressed as where t il is the lth failure time of the ith grating ruler; ] is the traceability variable of failure data. The failure marker value of the system with failure is 1, and the failure marker value of the system without failure is 0. The types of traceability variables in failure data can be divided into the following: I. For complete data, such as failure in an MS, the traceability variable is ε il = [1, 0, 0]; II. For incomplete data, such as the inability to accurately determine whether the failure occurred in the MS or the OS, the traceability variable is ε il = [1, 1, 0].

Failure data conversion method
Given that the test stress in ALT is higher than that under normal working conditions, the failure data under various stress levels must be converted to the failure data under constant stress [24]. An assumption is that K groups of test stresses exist in ALT (the first group is normal stress, and the other groups are accelerated stress) in which the loading time of the kth stress level S k is τ k , the number of grating rulers is n and r i times of failures are observed in the ith grating ruler in which the occurrence time of the l failure is expressed as t il . According to Hypothesis 4, the equivalent conversion formula of the cumulative probability density function of the Weibull and exponential distribution models is where η p and η q are the characteristic lives under stress levels S p and S q , respectively. Taking the Weibull distribution as an example, the effect of equivalent conversion is illustrated in Fig. 2.
The accelerated stress in ALT is temperature and humidity, and the form of the acceleration model is as follows: where η is the characteristic parameter; A, B and C are constants; T is the temperature stress value; and RH is the humidity stress value. After the logarithmic processing of the above where a = lnA, b = B, c = C, φ(T) = 1/T, ϕ(RH) = 1/RH. According to Eqs. (8) and (6), the conversion formula of test time t p under stress S q to equivalent time t p (p,q) under stress S q can be obtained: Assuming that the lth failure of the ith grating ruler occurs at the k il stress level, the conversion time of occurrence time t il under stress S 1 is where a k il is the sum of the censored time of stress levels from 1 to k il-1 , and b k il is the censored time of stress levels from 1 to k il-1 converted to normal stress S 1 , namely, According to Eqs. (11) and (12), Eq. (10) can be transformed into According to Eq. (9), η 1 /η u and η1/ηu are calculated: According to Eq. (13), time t j (b, c) (j =1, 2, …, r) under normal stress S 1 can be calculated. Assuming that the jth failure data come from the lth data of the ith grating ruler, then where * is the subsystem symbol (* = MS, OS, ES). In addition, the censored time should be converted to the censored data of each subsystem under normal stress S 1 . At the end of ALT (the censored time of stress S K ), if d grating rulers exist to stop the test, 3(nd) censored data t τ will be generated. The conversion formula is D is defined as the serial number set of discarded grating

the censored time, and
i is the number of the ruler, then: where * is the subsystem symbol (* = MS, OS, ES).

Parameter estimation
Given that the traceability variables in incomplete data belong to hidden variables, estimating the parameters by using MLE is very difficult [31]. Expectation maximisation algorithm (EM) is an iterative algorithm used to calculate maximum likelihood estimates containing model parameters with hidden variables [32]. The steps of estimating parameters according to the EM algorithm are as follows: Step1 Establish the logarithmic likelihood function and give the initial value of the parameters. Step2 (E-step) Calculate the lower bound function Q Step3 (M-step) According to the lower bound function is calculated and an iteration is completed.
Step4 Given a minimum positive number δ, determine whether the iteration converges.

Case analysis
In this section, the proposed method is used to evaluate the reliability level of the closed absolute grating ruler. The tested one is the JFT series grating ruler produced by Changchun Yu-heng Optical Co., Ltd. (China). Its accuracy is ±5 μm, resolution is 0.0025 μm, maximum speed is 180 m/min, working temperature ranges from 0 to 50°C, working relative humidity is from 20 to 80%, and sealing grade is IP53. The design distance of the tested grating ruler is 10,000 km. To improve the test efficiency and reduce the test cost, the step stress loading method is used to perform the test. The process is displayed in Fig. 3.

ALT system
The ALT device must be able to load the grating ruler according to the stress type in the actual working condition and to monitor the relevant parameters in real time [33]. The ALT device includes a servo driving platform and a temperature and humidity test chamber. The servo driving platform is placed inside the temperature and humidity test chamber, as shown in Fig. 4. The servo drive platform is used to load speed constant stress, and the temperature and humidity test chamber is used to load temperature and humidity accelerated stress. The servo drive platform has a vertical structure, and the structure of the front and rear sides is the same. The maximum number of test samples is six. Panasonic A6 series servo motor is selected for the test, with a rated speed of 3000 r/min, maximum speed of 5000 r/min and rated torque of 3.18 N·m. The model of the temperature and humidity test chamber is SUSHI UMC-1200 (China). Its temperature loading range is from −40 to +150°C, temperature deviation is ±2°C, humidity loading range is from 20 to 98%, and temperature deviation is ±5%. The acceleration sensor is placed beside the reading head of the grating ruler. The Pt 100 temperature sensor is pasted on each servo motor.
The laser interferometer is used to regularly measure the accuracy without removing the grating ruler. The laser interferometer consists of optoelectronic elements, such as laser head, spectroscope and reflector; installation elements, such as magnetic suction seat, adjusting frame and tripod; and compensation elements, such as ambient compensation unit and ambient/material temperature sensor.

Stress loading scheme
According to the actual operation data of CNC machine tools in a gearbox workshop, the eight-level loading spectrum of speed stress is formulated, as shown in Table 2. The distance of the load spectrum is 6 km, the time consumed is 5.4 h, and the average speed is 309 mm/s.  The temperature and humidity stress is the accelerated stress; the upper limit of temperature is 50°C, its lower limit is 25°C; the upper limit of humidity is 80%, and its lower limit is 45%. The temperature and humidity stress is divided into four levels, and S 1 is the normal stress. The test time of the four levels is τ 1 = τ 2 = τ 3 = τ 4 = 1125 h, as presented in Table 3.

Evaluation and analysis
During the process of ALT, if a repairable failure occurs, then the test will continue after maintenance; however, if an unrepairable failure occurs, then the test will be suspended. During the test conducted in this study, a total of 12 failures occurred, of which the accuracy exceeded the limit was the unrepairable failure. Failure information is shown in Table 4.
Using the method in Section 3.3, the failure data in Table 4 are converted to the data under normal stress S 1 . In the range of possible distribution, the initial value of parameter b θ 0 ð Þ = [5,5,8,25,45,8,25,45,8,25,45] is defined and the iterative convergence condition δ = 0.000001 is set. The parameters of the model are calculated as follows: According to Eq. (8), the relationship between the characteristic life of each subsystem and the temperature stress and humidity stress can be obtained, as illustrated in Fig. 5.
According to Fig. 5, the characteristic life of each subsystem under normal stress S 1 can be calculated as According to Eq. (4), the cumulative probability density function F(t) and the cumulative probability density functions F (MS) (t), F (OS) (t) and F (ES) (t) of the three subsystems can be obtained, as displayed in Fig. 6.
By deriving F(t), F (MS) (t), F (OS) (t) and F (ES) (t), the probability density distribution functions f (t), f (MS) (t), f (OS) (t) and f (ES) (t) are obtained, as shown in Fig. 7 According to the probability density distribution function f (t), the mean time between the failures of the grating ruler (MTBF t ) is calculated. At the same time, the mean time between the failures of the three subsystems (MTBF t (MS) , MTBF t ) is calculated as follows:    Fig. 6, when 0 < t ≤ 14,023 h, the ES has the highest cumulative failure rate and the OS has the lowest cumulative failure rate. The reason is that most of the early failures of the grating ruler lead to the abnormality of the digital readout, such as the improper installation of the reading head, loosening of the fixing screw and poor contact of the plug. At this stage, the ES needs key maintenance. When 14,023 h < t ≤ 15,357 h, the cumulative failure rates of the three subsystems are similar. When t > 15,357 h, the MS has the highest cumulative failure rate and the ES has the lowest cumulative failure rate. The reason is that the precision of some parts in the MS is reduced. At this stage, the MS needs key maintenance. According to Fig. 7, compared with the OS and ES, the MS has the lowest reliability. In the subsequent design of grating ruler improvement, the MS must be given focus. Note that as the amount of failure data increases, the above evaluation results will be more accurate.

Comparison and discussion
The Weibull distribution model is usually used to describe the life distribution of a complex electromechanical system. In this section, the proposed method is compared with the Weibull distribution model to verify the effectiveness. The parameters of the Weibull distribution model are estimated through MLE [30,31,34]. As presented in Table 4, the estimated values of the calculated parameters are as follows: According to Eq. (8), the temperature and humidity acceleration model of the grating ruler can be obtained: According to the above model, the estimated value of characteristic life under normal stress S 1 is b η ¼ 9984:26h. The cumulative probability density function F(t) can be obtained.
The mean time between failures (MTBF t ) of the grating ruler can be calculated as When the average speed is 309 mm/s, MTBF d = 10,289.05 km. The results of the proposed method are MTBF t = 8864.12 h and MTBF d = 9860.45 km, which is only 4.17% different from the Weibull distribution model. The advantages of the proposed method are that it cannot only describe the overall life distribution of the grating ruler but also clearly describe the life distribution of each subsystem. Determining the subsystem with the highest failure rate in each stage and putting forward targeted suggestions for subsequent improvement designs are helpful.

Conclusion
Grating rulers are one of the key parts that affect the machining accuracy of CNC machine tools. Due to the long lifetime of grating rulers, evaluating their reliability level by using traditional methods takes huge amounts of time and resources. Meanwhile, some incomplete data exist in the failure data, which will cause inaccurate evaluation results. To solve the above problems, an accelerated life reliability evaluation method of a grating ruler based on a competing risk model and incomplete data is proposed. The summary of this study is as follows.  In the subsequent designs of grating ruler improvement, the MS must be given focus.