The analysis often utilizes multiple regressions to explore the correlation between a dependent variable (Y) and a set of independent variables (X). The Ordinary Least Squares (OLS) Estimator is a commonly favoured method for estimating the parameters of the regression model. Subject to specific assumptions, this estimator possesses compelling statistical properties, contributing to its status as one of the most robust and widely used tools for estimating regression models. However, the OLS is subject to some common errors and limitations. In particular, regression analysis provides insights into the variations in the average value of the dependent variable when there is a change in any one of the independent variables, while keeping the other independent variables constant. When exploring the relationship between a random variable Y and another variable X, which may not necessarily be a random variable, an equation linking Y to X is typically referred to as a regression equation, as depicted below.
2.3.1 Limitation of the OLS model in regression analysis
An often-encountered deviation from the traditional linear regression model arises when non-normally distributed error terms are present. If the assumption of normality in errors is not met, the Ordinary Least Squares (OLS) estimator generates unpredictable prediction estimates [26]. Furthermore, multiple regression approaches exhibit instability in their results when confronted with outlier data points. The occurrence of outliers in the data disrupts the assumption of normality. A potential alternative approach for addressing outliers is to incorporate mechanisms to account for their influence. Accommodation is achieved through the application of various robust regression estimation methods. Additionally, another deviation from the assumption of independent error terms in the classical linear regression model arises in the form of autocorrelated errors. Autocorrelation refers to the correlation among observations in a time-ordered series, commonly observed in time series data [27]. In the realm of regression, the classical linear regression model presupposes the absence of autocorrelation in the disturbances \({\epsilon }_{i}\). Symbolically
\(E\left({e}_{i}{e}_{j}\right)\) ∀ i ≠ j (4)
When this assumption is breached, it results in an autocorrelation problem. Different corrective measures involving variable transformations have been developed. To mitigate autocorrelation, practitioners often resort to Feasible Generalized Least Square (FGLS) procedures, such as the Cochrane-Orcutt or Prais-Winsten two-step, as well as the Maximum Likelihood Procedure or Two-Stage Least Squares. These methods rely on specialized estimators for the correlation coefficient [28, 29].
A notable obstacle in regression estimation is multicollinearity, which arises when the explanatory variables are correlated. In these instances, the regression coefficients demonstrate sizable standard errors, and in some cases, incorrect signs may appear [29]. Various techniques are available in the literature to tackle this challenge, with one example being the ridge regression estimator first introduced by Hoerl and Kennard [30]. An alternative remedy is suggested by Keijan [31], offering an estimator with a similar structure but differing from the ridge regression method of Hoerl and Kennard. Ayinde and Lukman [32] introduced alternative approaches, including generalized linear estimators (CORC and ML) and principal components (PCs) estimators, to combat multicollinearity in estimation methods. It is noteworthy that these challenges can coexist within a dataset. Holland [33] recommended a robust M-estimator for ridge regression, addressing both multicollinearity and outliers simultaneously. Askin and Montgomery [34] introduced ridge regression based on M-estimates. Midi and Zahari [35] introduced the Ridge MM estimator (RMM) by combining the MM estimator and ridge regression. Samkar and Alpu [36] proposed robust ridge regression methods using M, S, MM, and GM estimators. Maronna [37] suggested a robust MM estimator in ridge regression specifically tailored for high-dimensional data. Eledum and Alkhaklifa [38] presented a Generalized Two Stages Ridge Estimator (GTR) designed for multiple linear models facing challenges of both autocorrelation AR(1) and multicollinearity.
M-estimation procedure
The commonly employed method for robust regression is M-estimation, initially introduced by Huber in 1964 [39], which exhibits efficiency comparable to Ordinary Least Squares (OLS). In contrast to minimizing the sum of squared errors, the M-estimate minimizes a function ρ of the errors. The objective function for the M-estimate is articulated as follows.
$${\sum }_{i=1}^{n}\rho \left(\frac{{e}_{i}}{s}\right)= {\sum }_{i=1}^{n}\rho \left(\frac{{y}_{i}-{X}^{{\prime }}{\beta }_{i}}{s}\right)$$
5
In this specific context, "s" represents a scale estimate, typically obtained through the linear combination of residuals. The function ρ delineates the impact of each residual on the objective function. An ideal ρ should demonstrate the following characteristics:
$$\rho \left(e\right)\ge ,\rho \left(0\right),\rho \left(e\right), and \rho \left({e}_{i}\right)\ge \rho \left({e}_{i}^{{\prime }}\right)for\left|{e}_{i}\right|\ge \left|{e}_{i}^{{\prime }}\right|$$
6
The set of normal equations required to address this minimization problem is derived by computing partial derivatives with respect to β and setting them equal to 0, yielding,
$${\sum }_{i=1}^{n}\psi \left(\frac{{y}_{i}-{X}^{{\prime }}{\beta }_{i}}{s}\right){X}_{i}=0$$
7
where ψ is a derivative of \(\rho\). The selection of the ψ function depends on the desired weighting of outliers. The nonlinear normal equations for M-estimates are typically solved using two methods: the Newton-Raphson method and iteratively reweighted Least Squares (IRLS). In the case of IRLS, the normal equations are formulated as follows:
$${X}^{{\prime }}WX\widehat{\beta }={X}^{{\prime }}Wy$$
8
MM estimator
MM-estimation, introduced by Yohai in 1987 [40], represents a unique variant of M-estimation. MM-estimators combine the high asymptotic relative efficiency typically associated with M-estimators with the robustness characteristic of a class of estimators known as S-estimators. It distinguishes itself as one of the early robust estimators possessing both of these qualities simultaneously. The term 'MM' indicates that multiple M-estimation procedures are employed in the calculation of the estimator. Yohai [40] outlines the three stages that define an MM-estimator.
1. An initial estimate is acquired using a robust estimator with a high breakdown point, denoted as \(\underset{\_}{\beta }\) with the requirement that the estimator be both robust and efficient. Calculating the residuals \({r}_{i}\left(\beta \right)={y}_{i}-{x}_{i}^{T}\underset{\_}{\beta }\) involves using this obtained estimate.
2. Utilizing the residuals obtained through the robust fitting process, and \(\frac{1}{n}{\sum }_{i=1}^{n}\rho \left(\frac{{r}_{i}}{s}\right)=k\) where k is a constant and the objective function 𝜌, an M-estimate of scale with 50% BDP is computed. This s(\({r}_{i}\left(\underset{\_}{\beta }\right),\dots .,{r}_{n}\left(\underset{\_}{\beta }\right))\) is denoted \({s}_{n}\). The objective function employed at this stage is denoted as \({\rho }_{0}\).
3. The MM-estimator is now characterized as an M-estimator of β, employing a redescending score function, \({\phi }_{i}\left(u\right)=\frac{\partial {\rho }_{i}\left(u\right)}{\partial u}\), and the scale estimate \({s}_{n}\) obtained from stage 2. So, an MM-estimator \(\widehat{\beta }\) defined as a solution to
$${\sum }_{i=1}^{n}{x}_{ij}{\phi }_{i}\left(\frac{{y}_{i}-{x}_{i}^{T}\stackrel{\sim}{\beta }}{{S}_{n}}\right)=0,j=1,\dots .,p.$$
9
S estimator
The S-estimator is a robust regression approach used in statistics to provide more reliable and resistant estimates of regression parameters when dealing with data that may contain outliers or influential data points. It is less sensitive to extreme and atypical observations. Derived from a scale statistic, S estimator is derived implicitly, like s (θ) where s (θ) is a variant of robust M-estimate of the scale of residuals\({e}_{1}\left(\theta \right),\dots , {e}_{n}\left(\theta \right).\)
They are defined through the minimization of residual dispersion:
Minimize: S \(({e}_{1}\left(\theta \right),\dots , {e}_{n}\left(\widehat{\theta }\right) )\) with final scale estimate,
\(\widehat{\sigma }= S ({e}_{1}\left(\theta \right),\dots , {e}_{n}\left(\widehat{\theta }\right) ).\) The dispersion, \(({e}_{1}\left(\theta \right),\dots , {e}_{n}\left(\widehat{\theta }\right) )\) is defined as the solution of \(\frac{1}{n}{\sum }_{i=1}^{n}\rho \left(\frac{{e}_{1}}{s}\right)\) = K (10)
Where K is a constant and \(\rho \left(\frac{{e}_{1}}{s}\right)\) is the residual function.
Tukey’s biweight function was suggested and is therefore defined as:
$$\rho \left(x\right) = \{\frac{{x}^{2}}{2} - \frac{{x}^{4}}{2{c}^{2}} + \frac{{x}^{6}}{6{c}^{4}} for\left|x\right|\le c \frac{{c}^{2}}{6} for \left|x\right|>c$$
11
Setting c = 1.5476 and K = 0.1995 gives 50% breakdown point.