Robust Estimation Method Based on Function Model of Successive Approximation

: In measurement practice, the residuals in least squares adjustment usually show various abnormal discrete distributions, including outliers, which is not conducive to the optimization of final measured values. Starting with the physical mechanism of dispersion and outlier of repeated observation errors, this paper puts forward the error correction idea of using the approximate function model of error to approach the actual function model of error step by step, gives a new theoretical method to optimize the final measured values, and proves the effectiveness of the algorithm by the ability of responding to the true values. This new idea is expected to be the ultimate answer of robust estimation theory.


Introduction
In measurement practice, the residuals obtained by the least squares adjustment usually show various abnormal discrete distributions, including outliers, so that the adjustment result is not an optimal result. In order to solve these problems, academia has carried out a lot of research and formed many principles and methods, such as robust estimation and so on. However, these theories are based on the error epistemology that outliers come from wrong measurement, therefore, weakening or eliminating the influence of outliers has become a common research direction. By studying the physical mechanism of dispersion and outlier of observation errors, this paper will show the principle process of dispersion and outlier caused by various regular errors, give a successive approximation algorithm using error's function model to realize error correction, and solve various abnormal distribution problems, so as to realize the optimization of adjustment results.
At present, various studies have proved the effectiveness of the algorithm by improving the distribution density (precision) of the final residual, but in fact, this improvement of precision is often at the expense of trueness, which cannot prove the actual ability of the algorithm to respond to the true value, because the true value is unknown. However, this paper will use true value plus error to simulate repeated observations, and verify the effectiveness of the algorithm by the ability of the final measured value to respond to the true value. We will see that weakening the influence of outliers or eliminating outliers actually leads to the sacrifice of the ability to respond to the truth. [1,2,3,4,5,6,7,8] The regularity of error means that there is a functional relationship between error  xmye@sgg.whu.edu.cn and some measurement conditions. For example, the periodic error of geodimeter is a sinusoidal function of distance condition, the AC interference error in voltage measurement is a sinusoidal function of time condition, the frequency error of quartz crystal is a function of temperature condition, the rounding error is a sawtooth law function of true value, the electronic noise error is a random function of time condition, and so on. The randomness of error means that all possible values of error form a random distribution, or the error exists in a limited probability interval. For example, periodic error and AC interference error follow U-shaped distribution, rounding error follows rectangular distribution, quartz crystal frequency error follows M-shaped distribution, electronic noise error follows normal distribution, etc., as shown in Figure 1.

Regularity and randomness of error
That is, the regularity and randomness of errors are the result of observing errors from different perspectives, and errors are the unity of regularity and randomness.

Dispersion and outlier of observation error in repeated measurement
In measurement practice, repeated measurement conditions are in a changing state. For example, in leveling network survey, the instrument erection conditions (leveling, height, direction, temperature, etc.) of each route are different from each other; in traverse network survey, the instrument erection conditions and distance conditions observed by each traverse are also different from each other; in GNSS network survey, the positions of satellites in each observation period are different from each other, and the signal propagation conditions of each survey station are different from each other.
When the measurement conditions associated with the regular error change, it will inevitably drive the error to change, which is the physical mechanism of the dispersion caused by the regular error. For example, when the distance condition changes in repeated measurement, the periodic error will lead to the dispersion of observation error Noise error sequence; When the time condition changes in repeated measurement, the AC interference error will lead to the dispersion of observation error sequence; When the temperature condition changes in repeated measurement, the frequency error of quartz crystal will lead to the dispersion of observation error sequence; When the range conditions change in repeated measurement, the rounding error will lead to the dispersion of observation error sequence; When the time condition changes in repeated measurement, the noise error will lead to the dispersion of observation error sequence; and so on. On the contrary, when the measurement conditions associated with the regular error remain unchanged in the repeated measurement, the error will remain constant in the repeated measurement, resulting in the overall deviation of the observation error sequence.
However, the actual measurement conditions are usually neither balanced change nor absolutely unchanged, but an unbalanced change, which will inevitably drive the corresponding regular errors to show unbalanced changes. This is the physical mechanism of abnormal distribution or even outlier of errors in measurement practice. For example, the serious imbalance of phase condition change of periodic error and AC interference error will cause them to form outlier distribution, the serious imbalance of discarding four and leaving five will cause outlier distribution of rounding error, the serious imbalance of temperature condition change will cause outlier distribution of quartz crystal frequency error, and so on.
For example, Table 1 simulates a repeated differential observation data with a true value of 8m by using the periodic error ) )( 4 2 20 sin( It can be seen that each observation value Si is different from each other, but the mean value is 8.0014, indicating that both dispersion and deviation coexist, and its distribution is also uneven. Moreover, when the number of samples is small, the random superposition of several errors can also appear outlier phenomenon.

Error processing with function model and random model
The dispersion and outliers of repeated observation errors come from regular errors, or even the superposition of a variety of different regular errors. Therefore, the observation error sequence can be regarded as both regular errors and randomly distributed errors. Naturally, both using the error function model to correct the error and incorporating the error into the random model to realize the self-compensation of the error are effective schemes to realize the adjustment.
Example 1: use the observation data in Table 1 to find the best measured value with random model and functional model respectively.
The error equation treated according to the random model is: The best measured value obtained according to the least square method is: It can be seen that the final error is 1.4mm, which is much smaller than the 5mm amplitude of periodic error.
The function model of periodic error is The error equation treated according to the functional model is: , the error equation becomes: According to the least square method, the normal equation is: Substituting the data, the results are: In this way, the measured value returns to the true value, and the amplitude is In short, the error can not only be corrected by its function model, but also be incorporated into the random model to realize its self-compensation.

The harm of eliminating outliers
However, in actual measurement, there are many and miscellaneous sources of errors, and it is impossible to fully understand the functional law of each error, so most of the functional model processing like example 1 is unrealistic. Moreover, some errors cannot be treated with a strict function model as example 1 to realize error correction. For example, the rounding error in Figure 1 is the sawtooth law of the true value, but the true value is precisely unknown. This is quite different from the periodic error in example 1, the random model seems to be the only way out, but the random model processing has to face the dilemma of unbalanced error distribution. The following is a simulation case to illustrate.
Example 2: the true mass values of three objects A, B and C are 5.1g, 4.2g and 7.2g respectively. Now, the readings of the precision balance are rounded to the gram bit for the combined measurement of the three objects. The original mass observation values must be as shown in Table 2. Now, we calculate the best measured value of each object mass according to the least square method to observe the response of the measured value to the true value.
Let the masses of the three objects be 2 1 ,X X and 3 X respectively, then the error equation is: According to our past thinking, 7 v must be judged as a gross error, which is caused by wrong measurement and should be eliminated. After the elimination, the least square method is used again to obtain: It can be seen that this is actually counterproductive. The residual looks very comfortable and can give a very high precision evaluation, but the error of the measured value is actually greater than that without elimination! That is, 7 v is actually a normal measurement error, the root cause of outliers is the imbalance of measurement data collection, and the high precision obtained by eliminating outliers is at the expense of trueness.

Successive approximation algorithm
It has been confirmed that outliers should not be eliminated. Then, in order to benefit the final measured value, can we use the approximate function model of error? Of course, the answer is yes. The periodic error in example 1 can be treated by functional model because the  are the functions of measurement serial number i, so its essence is that i V is the function of i. However, the i V in example 2 is also a function of i, and the i V in any measurement is a function of i, but the problem is only that these functional relationship cannot be expressed as an accurate mathematical model as example 1. Although they cannot be expressed as an accurate mathematical model, it also should be effective to use approximate mathematical model.
According to this idea, let's see the distribution of i V in example 2. After the adjustment, it is found that 7 v is 0.625, which is an outlier. The reason for the outlier is that the rounding error (rectangular distribution) with sawtooth law is unevenly sampled. Therefore, once there are two distinct residual groups, we can reasonably believe that there is a regular error sub item in the residual, and its contribution to the two communities is the same, but the sign is opposite. Then, we can use the functional model of this error sub item to improve the observation error equation, so as to further approximate the two communities of residuals, whose principle is similar to Fourier series approximation. In this way, the observation error equation of example 2 is improved to: It can be seen that the residual is obviously approached, its distribution is improved, and the outlier phenomenon disappears. However, it is obvious that there are still two communities of positive and negative residuals. We can continue this idea and make the approximation algorithm again, so the error equation becomes: The residual is approximated to 0. 8 Now, this method is also applied to example 1. After four rounds approximation algorithm, the following results are obtained: Compared with the measured value of 8.0014 processed by the random model in the original example, the error of the measured value of 8.000032 is reduced by 45 times. The function model parameter value 1~4 shows obvious convergence, which also shows the effectiveness of the method.
In both cases, the observed value is simulated by the true value plus error, and the effectiveness of the algorithm is tested by the ability of the final measured value to respond to the true value. In terms of effect, the final measured values of the two cases have been greatly improved compared with the pure random model. Now, let's see the application effect of a practical case. Table 3 is a set of observation data that using six segment baselines method calibrate additive and multiplication constant error of a geodimeter.
The error equation is: According to the least square method, there are:  Now, according to the approximation algorithm, after four rounds of approximation algorithm, we get:  It can be seen that the quality of the results is greatly improved because the approximation algorithm corrects the regular errors such as the residual periodic error of the geodimeter, which greatly weakens their impact on the detection results.
In short, the core of understanding this concept principle is to abandon the traditional thinking that the residual is white noise [3]. The theoretical basis of this method is that various regular errors are the root causes of dispersion and outlier, the residuals not only follow random distribution (including abnormal distribution) but also have regularity, and the error can be treated not only according to random model, but also according to function model. Its principle is to use the approximate function model of error to gradually approximate the actual function model of error to realize error correction. Its essence is to fit the error according to the function model , so it can effectively overcome the imbalance of sampling. Of course, the premise of the effectiveness of this method is that the overall law characteristics of the error have been sampled and cannot be seriously missing (eliminated). It should be pointed out that according to this algorithm, when the redundant observations are sufficient, the precision can be much higher than that of random model processing, because when there are enough redundant observations, the law details of the residuals can be displayed and the model fitting can be accurate; when there are few repeated observations, the law details of the residuals cannot be fully displayed, the approximation times are limited, and the fitting effect will naturally be limited.

About gross error
The rational use of outliers is discussed above. Another problem is: how to distinguish the gross errors caused by wrong measurement? The answer is, take the maximum allowable error (MPE) index of the measuring instrument (sensor) as the judgment basis. When the standard deviation given by the least square is consistent with this index, of course, it can be judged that all observed values are normal; on the contrary, due to wrong operation (including instrument failure), the error can reach thousands of times of the nominal tolerance index of the instrument, which is very easy to find in the adjustment and does not need too complex mathematical principles at all. For example, there was a fault phenomenon of "ten meters" error in the early phase geodimeter, and its error value was an integral multiple of the precision ruler length (the precision ruler length of the early rangefinder was mostly 10 meters, which rarely appeared after improving the instrument design). The physical principle of its formation is that many electric rulers of different lengths are used in the phase photoelectric rangefinder for measurement, when the measurement error of the long ruler is greater than 1 / 2 of the length of the fine ruler, there will be errors in the connection between the measured values of the fine measurement and the rough measurement. However, this error can reach thousands of times of the nominal precision of the instrument, and can even appear in several observation equations of the traverse network many times. When the standard deviation given by the least square method reaches thousands of times of the nominal limit difference of the instrument, I believe anyone can know that there is a problem with the data. Therefore, this gross error in a traverse network can be easily identified without complex mathematical methods. The processing method is usually to send the instrument for repair, inspection and re measurement. I believe no one dares to use some mathematical method to save this kind of measurement data containing a large number of gross errors.
That is, there is no necessary relationship between outliers and gross errors, and outliers are not necessarily gross errors.

Conclusion
Through the physical mechanism of dispersion and outlier of repeated observations, this paper expounds that the discrete error sample sequence follows both random distribution and regularity, and can realize adjustment not only by self-compensation, but also by function model correction. Therefore, a function model correction algorithm using the approximate function model of error to realize successive approximation is derived, which is essentially the same as the traditional function model, greatly improves the quality of measured values, and can effectively overcome the influence of outlier error. This algorithm will play a positive role in promoting the development of measurement theory.