Discussion on the damping factor and improvement of the Levenberg-Marquardt method


 The ill-posed problem is the key obstacle to obtain the accurate inversion results in the geophysical inversion field, and the Levenberg-Marquardt1, 2(hereinafter referred to as the L-M method) method has been widely used as it can effectively improve the ill-posed problems. However, the inversion results obtained by the L-M method are usually stable but incorrect, the reason is that the damping factor in the L-M method is difficult to solve, and it is usually approximated with a positive constant by experience or through some fitting methods. This paper uses the binary gravity model to demonstrate that the damping factor in the L-M method cannot be regarded as a positive constant only, it should have the following characteristics: (i) the damping factor is a vector, not just a constant; (ii) the values of the vector are composed of both positive and negative constants, not just positive constants; (iii) the corresponding value in the vector is close or equal to ∞ when the corresponding density block’s value is close or equal to zero. Even if the above characteristics have been found in the L-M method, it is difficult to reasonably estimate the damping factor as the damping factor oscillate severely due to the third characteristic, and the improved L-M method is proposed which effectively avoids the damping factor’s severe oscillation problem. The strategy of obtaining the reasonable damping factor is given finally.


Introduction
The significance of the gravity inversion study for exploring the underground density structure is self-evident. Generally, the inversion models are usually simplified before implementing the gravity inversion study, and the binary or ternary inversion models have received widespread attention 3, 4,5,6,7,8,9,10,11,12 . However, even if we implement our inversion research on the binary or ternary model, the inversion results are usually not ideal because they are severely affected by the ill-posed problems which usually appear during the gravity inversion calculations.
Most scholars try to solve the ill-posed problems based on the L-M method which is also called the Regularization method 13 (the L-M inversion equation is constructed by the residual term, the 2 damping term and the damping factor, which are equivalent to the residual term, the Regularization term and the Regularization factor in the Regularization equation, respectively). In other words, the L-M method are the foundation of the geophysical inversion field and most of the new inversion methods are derived from the L-M method. The ill-posed problems can be well solved by the L-M method, but the inversion results are usually stable but not correct. The key reason is that the damping factor of the L-M method is always considered to be a positive constant! (Some methods derived from the L-M method indeed obtain ideal inversion results but these methods are established on adding additional constraints which are often difficult to obtain in practical applications or promote due to their particularities 7,14,15,16,17,18 . This paper does not consider the situation of adding additional constraints, and only the unique constraint of the L-M method is further processed and used for the Binary inversion model in this paper). This paper is devoted to demonstrate why the damping factor of the L-M method cannot be just a positive constant, summarizing the characteristics of the damping factor of the L-M method, and improving the L-M method with almost no additional constraints.

Discussion of the damping factor of the L-M method
Analysis of the principle of the damping factor of the L-M method This paper takes the 2.5-dimensional (2.5D) binary inversion model as an example. Fig.1 represents a profile of the 2.5D binary model composed by the density blocks (q × 1),and the corresponding gravity anomaly data containing the Gaussian noise is ̃( p × 1). The kernel function matrix is G(p × q). Apparently, we have: � represents the approximation of the real density value .
3 Fig.1 Illustration of the 2.5D Binary inversion model composed by the density blocks (q × 1) and the corresponding gravity anomaly data ̃( p × 1) If the number of the gravity data is greater than the number of the unknown density blocks (p > q) and the kernel matrix G is not ill-posed, we only need to use the simple Least Squares method to easily obtain the ideal inversion results:

( )
represents the weighting function of the gravity anomaly data ̃.
After differentiating the above formula, we can obtain: (3) In equation (3), · T is the transpose of ·.
However, the kernel matrix G is usually ill-posed so that we cannot simply apply equation (3) to the inversion calculations (Generally, G is well-posed only if the number of the density blocks is extremely limited).
Fortunately, there is always some additional geophysical information which can be easily obtained without wasting too much manpower and financial resources in the geophysical inversion field, and the information can be applied to the inversion calculation so that the inversion solution can be constrained in a more reasonable or compact range. For example, the values of the density blocks must be within a certain range (e.g. greater than zero and less than infinity at least, and cannot 4 be any real numbers) for the underground substances, then we can easily obtain the following constraint equation: This constraint is also applied to the Binary inversion model for which there are only two kinds of values in the inversion model.
And combined with equation (4), Marquardt give the following well-known L-M inversion equation based on equation (2): In equation (5), k is a positive constant which does not need to be known in the inversion calculation process. According to equation (5), using the Lagrange Multiplier Method, we can obtain: Furthermore, we can obtain: Here, λ 0 is regarded as a positive constant 1, 2 , I is the identity matrix. Comparing with equation (3), the positive constant λ 0 is added to the diagonal of G T W d T W d G in equation (7), which makes the condition number of G T W d T W d G improved and the inversion solution stable.
Although the L-M method makes the inversion solution stable, the inversion solution is usually incorrect which is greatly different from the actual density value. The specific analysis is as follows: Assuming the real gravity data and the corresponding noise are and ∆ , respectively, and the real values of the density blocks and the corresponding residual of the inversion results caused by ∆ are and ∆ , respectively, which is: Now substitute equation (8) Because G = , and we have： = (10) Now substitute equation (10) into equation (9): Apparently, we hope � = + ∆ → or ∆ → 0，then Now substitute equation (12) into equation (11): According to equation (13), the accurate λ 0 can be obtained if the real density value and the noise of the gravity data ∆ is known. Now the specific form of the ith expression in equation (13) is given as follows: In equation (14), represents the ith value of the real values , G T represents the ith row of G T .
Then we give two conclusions as follows based on equation (14): i) λ 0 in the L-M method should be replaced by a vector ( = [λ 1 , λ 2 , … , λ q ] T ).
Analysis: if m i ≠ 0, according to equation (14), for each m i : And according to the L-M method, λ 0 is a constant, so the following expression should always hold: However, we know that G T i ≠ G T j due to the characteristics of G , and apparently m i is probably not equal to m j , so in equation (16).Therefore, 6 equation (16)  shown in equation (15). Instead, we should use a series of constants which construct a new vector to represent So, the L-M method shown in equation (7) should be modified into the following form: diag( ) represents a diagonal matrix with the elements of the vector .
Analysis: This conclusion is obvious according to equation (17), the value of will be approaching or equal to ∞ when the corresponding density value → 0 or = 0 . Because when G T i W∆ ≠ 0, needs to play a balanced role to assure equation (17) holds.
Synthetic test on the damping factor of the L-M method We give a specific synthetic test to verify the correctness of the discussion on the damping factor of the L-M method above. In Fig.2a, the gravity dataset is obtained by forward modelling, in which the red dots represent the real values of the gravity dataset and the black line represents the contaminated gravity dataset by Gaussian noise equivalent to 5% of the accurate datum. And in For this synthetic test, first, we substitute the actual density value and the actual Gaussian noise ∆ into equation (17) to obtain the accurate of the L-M method shown in Fig. 3, and is equal to a very tiny constant (e.g. 10 -50 ) to replace 0 when = 0. We find that fluctuates severely between about -6×10 33~2 ×10 33 in Fig. 3 Second, we substitute the accurate shown in Fig. 3 into equation (18) and the inversion results is obtained by the L-M method shown in Fig. 4. We find that the inversion results are completely consistent with the real values , which means the damping factor by equation (17) is correct for obtaining the accurate inversion results. negative and even a constant approaching ∞ (Fig. 3). This is very different from what we thought in the past that "the damping factor is just a tiny and positive constant".

The modified L-M method based on the Binary inversion model
The principle of the modified L-M method based on the Binary inversion model In the previous section, on one hand, we find that the accurate damping factor can be obtained by the real density value and the real Gaussian noise ∆ , but neither of them can be known in the actual situation. Therefore, we must roughly estimate in some way. On the other hand, of the L-M method fluctuates severely as shown in Fig. 3 Apparently, the tighter constraint can be imposed on equation (2), which is: Furthermore, the new form of gravity inversion equation is given as follows:  Comparing to equation (6), the term 1 2 (m 1 + m 2 ) is added to the right side of equation (22) besides the damping factor converting from λ 0 to the vector .
And the specific inversion solution is as follows: Next, we still use the same analysis process of the L-M method (equation 8~14) to analyze the modified L-M method. We substitute equation (8) into equation (22), and if we want ∆ → 0, then: And for the ith expression in equation (24), we have: And for the Binary inversion model, because = 1 or 2 , we substitute it into equation 25 and we obtain: (26) Synthetic test on the damping factor of the modified L-M method We still use the inversion model and the corresponding gravity data shown in Fig. 2. The damping factor is obtained by equation (26)   We also calculate the inversion results � shown in Fig. 6 using obtained by the modified L-M method shown in Fig. 5 and � is the same as the real values .
In equation (27), ̃ represents the estimation of . ∆̃ is the Gaussian noise with the same distribution as ∆ (we can use math software to produce ∆̃ easily). 1 and 2 represent the number of the density blocks equal to m 1 and m 2 , respectively. τ is a constant equal to -1 or 1 (determined randomly). (27) For the inversion model shown in Fig. 2, the number of = 1 (16 blocks) are almost the same as = 2 (14 blocks), we use equation 27-3 to obtain ̃. And the estimation ̃ and the accurate are drawn in the figure (Fig. 7), we can see that ̃ and are very close, the relative error obtained by equation 28 is only about 104.50% for the extremely tiny (The maximum value does not exceed 3 × 10 -16 ).

Analysis of equation
13

(triangles)
However, ̃ is very close to shown in Fig. 7, the inversion results are still not ideal except that the shallowest layer of the inversion results is in good agreement with the actual situation. We use a new form of figure (Fig.8) to make it easier to observe the differences between the inversion results � and the real density values . The abscissa in Fig.8 represents the density blocks arranged in the specified order corresponding to the order of the density blocks from left to right and top to bottom in Fig. 2, and the ordinate represents the value of the corresponding density block.
In Fig. 8, the shallowest layer (the 1 st to 6 th density blocks) of the density blocks is indeed ideal but the others are not (the 7 th to 30 th density blocks).