A new uncertain linear regression model based on equation deformation

When the observed data are imprecise, the uncertain regression model is more suitable for the linear regression analysis. Least squares estimation can fully consider the given data and minimize the sum of squares of residual error and can effectively solve the linear regression equation of imprecisely observed data. On the basis of uncertainty theory, this paper presents an equation deformation method for solving unknown parameters in uncertain linear regression equations. We first establish the equation deformation method of one-dimensional linear regression model and then extend it to the case of multiple linear regression model. We also combine the equation deformation method with Cramer’s rule and matrix and propose the Cramer’s rule and matrix elementary transformation method to solve the unknown parameters of the uncertain linear regression equation. Numerical example show that the equation deformation method can effectively solve the unknown parameters of the uncertain linear regression equation.


Introduction
Regression analysis is an important branch of statistics. It is a kind of statistical method to study the relationship between response variables and explanatory variables. Regression analysis is also one of the most commonly used and important statistical tools (Chatterjee and Hadi 2006), and its application fields are very wide. Linear regression is an important model of regression analysis. One-dimensional linear regression can be described by linear equation, and multivariate linear regression can be described by linear combination of B Hongmei Shi yqbshm@163.com Shuai Wang wangshuai@sdyu.edu.cn Yufu Ning nyf@sdyu.edu.cn variables (Ma 2014). Linear regression can describe the linear relationship between random variables, but it requires precise data, and the random variables obey or are close to probability distributions. But in fact, the observed data of many practical problems are imprecise or the information obtained is not sufficient, or even some practical problems have no data. The traditional regression analysis encountered difficulties at this point.
When observed data are imprecise or information is not available, we often invite domain experts to estimate the extent to which the event is likely to occur, or to predict the possible range of the event (Kahneman and Tversky 1979). We call the data given by experts as the belief degree, but the belief degree usually does not approximate the probability distribution, so we need a new way to deal with the belief degree. Uncertainty theory (Liu 2007(Liu , 2012 is very good at studying and analyzing of belief degree. Liu (2007) founded the uncertainty theory in 2007, and gradually perfected and developed it (Liu 2015(Liu , 2010(Liu , 2009(Liu , 2017. With the deepening of the research, uncertainty theory has been applied to many fields and achieved good results Ning 2019, 2017;Ning et al. 2013aNing et al. , b, 2019Liu and Ha 2010). Liu (2010) first proposed the uncertain statistics in 2010 and designed the method of questionnaire survey to construct the uncertainty distribution of uncertain variables. Chen and Ralescu (2012) estimated the distance from Tianjin to Beijing in 2012 which liu proposed questionnaire survey and the results showed very efficient. To estimate the unknown parameters in the uncertainty distribution, Liu (2010) proposed a principle of least squares in 2010. With the further study of uncertain statistics, Wang and Peng (2014) puts forward the method of moments to estimate the unknown parameters, Guo et al. (2017) proposed an uncertain linear regression model in 2014. Yao and Liu (2018) puts forward a point estimation method for solving unknown parameters of uncertain regression equation through the principle of least square method in 2018, which is a method of processing imprecisely observed data. In 2018, Song and Fu (2018) proposed a least square method to solve the unknown parameters of uncertain multiple linear regression. Chen (2020) proposed Tukey's biweight estimation for uncertain regression model with imprecise observations in 2020.
Least squares estimation can solve the parameters of linear uncertain regression equation, but it needs some advanced mathematics foundation. When there are many explanatory variables or large amount of data, the least squares estimation will encounter some difficulties. This paper presents an equation deformation method for solving unknown parameters of linear regression. The equation deformation method can solve the unknown parameters only by using the deformation of the equations, which is easy to understand and easy to use.
The remained organizational structure of this paper is as follows: In the second section, this paper first gives the calculation formula of expected value. Then the uncertain regression model is introduced and the least squares estimation for solving the unknown parameters is given. In Sect. 3, based on the uncertainty theory, we proposed the equation deformation method to solve the unknown parameters and deduce the solving process in detail. Then, we extend the equation deformation method to multiple linear regression model. Then in Sect. 4, according to the Cramer's rule and the elementary transformation of the matrix, we proposed auxiliary solutions of the equation deformation method. In the fifth section, we verified the feasibility of the equation deformation method through a numerical example and compared it with the existing method. Finally, we have made a summary. In 2007, Liu (2007 founded the uncertainty theory based on the three axioms of Normal Axiom, Duality Axiom and Subadditivity Axiom. In 2009, Liu (2009) perfected the uncertainty theory through the Product Axiom. The uncertainty theory defines the uncertain variables and the uncertainty distribution, and the inverse uncertainty distribution is used to solve the expected value. Readers interested in uncertainty theory can read the Reference Liu (2017) for other basic concepts and theories of uncertainty theory.

Uncertain regression model
In this section, we first introduced the concept and calculation method of expected value. Then, we give the uncertain regression model and introduced the least squares estimation method to solve the regression equation.
Theorem 2.1 (Liu 2010) Assumed that the uncertain variable ξ has an regular uncertainty distribution Φ. Then (1) Theorem 2.2 (Liu 2010) Let ξ and η be independent and positive uncertain variables with regular uncertainty distribution Φ and Ψ , respectively, then ( 2) Assumed that (x 1 , x 2 , · · · , x p ) is a vector of explanatory variables, and the corresponding response variable be y. Now, we assumed that (x 1 , x 2 , · · · , x p ) has a functional relationship with y and can be expressed in the following regression model where β is the vector of the unknown parameters, and ε is a disturbance term (Liu 2017). We will call a linear regression model (Liu 2017). Now we assumed that we have a set of imprecisely observed data, Based on the above imprecisely observed data, Yao and Liu (2018) proposed the least squares estimate of β in the regression model The vector β is the solution of the following minimization problem If the minimization solution is β * , then the fitted regression equation is y = f (x 1 , x 2 , · · · , x p | β * ). When p = 1, we call the multiple linear regression equation. Then for each index i(i = 1,2,· · · ,n), the term is called the i-th residual (Lio and Liu 2018). Let the disturbance term ε is uncertain variable, its expected value and variance can be estimated aŝ and whereε i i-th residuals, i = 1,2,· · · ,n, respectively (Lio and Liu 2018).

The equation deformation method
In this section, based on the uncertainty theory, we proposed an equation deformation method for solving unknown parameters. The idea of equation deformation method is to construct a set of equations with the same number of unknown parameters and solve them with expected value. Then, we extend the equation deformation method to multiple linear regression equations.

Equation deformation method for one-dimensional linear regression model
we always assumed that (x i ,ỹ i ), i = 1, 2, · · · , n are a set of imprecisely observed data, wherex i ,ỹ i are independent uncertain variables with regular uncertainty distributions Φ i , Ψ i , i = 1, 2, · · · , n, respectively. We supposed that (x i ,ỹ i ), i = 1, 2, · · · , n, satisfied the linear regression equatioñ where β 0 and β 1 are the unknown parameters. Equation (13) has two unknown parameters. If Eq. (13) has a solution, we only need two equations to solve it. Equation (13) contains n independent equations, and unknown parameters can be solved by arbitrarily selecting two equations. However, the selected equations are very one-sided and cannot represent the overall properties of the variables. In order to fully consider the influence of the value of variables and minimize the error, we construct two new equations. The detailed process is as follows.
We can take the expected values of both sides of Eq. (13), turn it into a real coefficient equation, and we get Add the n equations in Eq. (14), and we get Equation (15) is transformed into According to Eq. (1), Equation (16) is converted into Multiply both sides of Eq. (13) byx i , and we get Add the n equations in Eq. (18), and we get According to Equation (1) and (2), Eq. (19) is converted into Solved Eqs. (17) and (20), and we got the estimated values of β 0 and β 1 .
The derivation of one-dimensional linear regression of equation deformation method is relatively simple. On the basis of considering all the data, we reasonably construct two equations, and the unknown parameters can only be solved through the equation deformation and expected value. The equation deformation method does not need the foundation of advanced mathematics and is easy to understand and calculate.

Equation deformation method for multiple linear regression model
The equation deformation method can solve the unknown parameters of one-dimensional linear regression, then can it solve the unknown parameters of multiple linear regression equation? Let's go ahead and derive it. Assumed that there is a linear functional relationship between uncertain variables (x i1 ,x i2 , · · · ,x i p ) andỹ i . In order to solve the p + 1 unknown parameters of linear regression equation, a more effective method is to establish p + 1 equations. On the basis of the uncertainty theory, we proposed an equation deformation method to solve the unknown parameters of linear regression equation by using the expected value. The equation deformation method can fully considered the imprecisely observed data. The specific steps are as follows.
Step 1. Assume that the linear regression equation is where β 0 , β 1 , β 2 , · · · , β p are unknown parameters. Add up the n equations in Eq. (21), then Multiply both sides of Eq. (21) byx i1 , then we get Add up the n equations in Eq. (23), then Multiply both sides of Eq. (21) byx i2 , then we get Add up the n equations in Eq. (25), then And so on, Multiply both sides of Eq. (21) byx i p , then we get Add up the n equations in Eq. (27), then So we have p equations as follows There are p + 1 equations in Eqs. (22) and (29). If the unknown parameters has a solution, the β 0 , β 1 , β 2 , · · · , β p can be solved by p + 1 equations.

So the first equation in Eq. (30) is transformed into
According to Equation (1), Eq. (31) is transformed into The other equations in Equation (30) According to Eqs. (1) and (2), Eq. (33) is transformed into Equations (32) and (34) contains p + 1 equations. By solving the above p + 1 equations, the estimated values of the unknown parameters β 1 , β 2 , · · · , β p can be obtained. We can get the fitting equation of the multiple linear regression.

Other auxiliary solutions
In the equation deformation method of multiple linear regression model, we can regard Eq. (30) as a linear system of equations. With the help of the knowledge of linear algebra Department of mathematics (2014), we further discuss the equation deformation method, and proposed the Cramer's rule and elementary transformation of matrix for solving the unknown parameters of linear regression equation.

Cramer's rule
Equation (30) is equal to We assumed that the coefficient matrix corresponding to the above equation is A, the determinant of matrix A is If the linear system of equations has a unique numerical solution according to Cramer's rule.
Replace the JTH column of determinant |A| with the constant term on the right side of the linear system of equations, and the resulting determinant is denoted as |A j |, i.e., According to Cramer's rule, a set of numerical solutions of the linear system of equations are Solving Eq. (39), we can get the estimated values of the unknown parameters β 0 , β 1 , · · · , β p . Therefore, the linear regression equation can also be solved by Cramer's rule.

Elementary transformation of matrices
We start with Eq. (30), which can be expressed as the following matrix equation where β = (β 0 , β 1 , β 2 , · · · , β p ) T , and  The augmented matrix of the linear system of equations is If the rank of the coefficient matrix A is equal to the rank of the augmented matrix B, then the linear system of equations has a unique set of numerical solutions. The unknown parameters β 0 , β 1 , β 2 , · · · , β p can be estimated by elementary row transformation of the augmented matrix B. Therefore, the linear regression equation can also be solved by elementary transformation of matrix.

Numerical example
In order to verify the feasibility of the equation deformation method, we give an example of imprecisely observed data and compared it with the least squares estimation. Furthermore, we numerically analyzed the estimated expected values and variances of the disturbance terms by using the methods of References Lio and Liu (2018) and Liu and Yang (2020).
Assumed that (β 0 , β 1 ) = (5, 2), and the one-dimensional linear regression model is For the above one-dimensional linear regression model, we designed two sets of imprecisely observed data (x ji ,ỹ i ), i = 1, 2, · · · , 10, j = 1, 2 as shown in Table 1. The first set of imprecise data (x 1i ,ỹ i ), i = 1, 2, · · · 10 is the normal dataset. The second set of imprecisely observed data (x 2i ,ỹ i ), i = 1, 2, · · · 10 is called a singular dataset, which contains outliers for i = 3, 5, 8. Now we compare the estimated values of unknown parameters by the equation deformation method and the least squares estimation.  For the first set of imprecise data, the regression equations obtained by equation deformation method and least square estimation are consistent with the original equation, which will not be discussed in detail here. The results obtained from the second set of imprecise data are discussed in detail as follows The fitting linear regression equations obtained by the equation deformation method and least squares estimation are shown in Table 2.
As can be seen from Table 2, the two fitting equations are significantly different, and the constant terms are greatly different, and the fitting effect is greatly different.
The bias between the estimated values of β 0 and β 1 and the corresponding values of the original equation is shown in Table 3. The bias here is the estimated values minus the original values.
It can be seen from Table 3 that the equation deformation method can better deal with the data with singular values, and the deviation of the corresponding coefficient is small.
The expected value of the disturbance term is almost zero, which indicates that the fitting effect is good. The variance of the disturbance term is large because the data we set contains three singular values and the degree of dispersion is large.
Numerical examples show that the equation transformation method is feasible, easy to understand and simple to calculate.

Conclusion
In this paper, we discussed the uncertain regression model and put forward the equation deformation method to solve the unknown parameters in the linear regression equation. Then the equation deformation method is extended to multi-ple linear regression model. We also proposed the Cramer's rule and the elementary transformation method of matrix, both of which can solve the unknown parameters of the linear regression equation, but required the reader have a basic knowledge of linear algebra.
Equation deformation method does not require advanced mathematical knowledge such as calculus, so readers can better understand and use. In the case of fewer unknown parameters, the equation deformation method is relatively simple to solve the unknown parameters. However, when the number of unknown parameters is large, the calculation amount of equation deformation method is large.
This paper presents an equation deformation method for solving unknown parameters in the linear regression equation. The next work is to try to use MATLAB or Python programming to solve the numerical solution of unknown parameters, so as to better solve the linear regression equation with more complex data and more variables.