Optimum adaptive bandwidth selection method of local fitting in kernel regression analysis for non-uniform data

Selection of a global bandwidth is commonly used in kernel regression. On the other hand, the pointwise choice of a local bandwidth can lead to better results in kernel regression because it has direct effect on smoothing the signal. These smoothing bandwidths affect the filtering capacity of all signals and systems. It demonstrates a greater adaptability to a variety of analysis ranging from one-dimensional to multidimensional problems, as well as different classes of engineering branches of human–machine interactions. In this paper, we propose a new method called optimum adaptive local bandwidth selection method (OALB), which depends on the bias-variance optimization ratio. It is based on Stankovic optimization of the bias-variance of the signal (Stanković in IEEE Trans Signal Process 52:1228–1234, 2004). The bandwidth is calculated independently for every point based on the intersection of confidence interval (ICI).

in the given system. All the research in last 3-4 decade is looking for some mathematical solution independent of the platform. Hence, we are trying to relate this broad class of representation in somewhat simplest and sophisticate way of representation in all this emerging technology and it motivates us to come out with the simple solution applicable to various platform.
Nonparametric local polynomial regression (LPR) [1][2][3][4][5][6][7][8][9][10][11] has been widely applied in many research areas, such as data smoothing, density estimation, and nonlinear modeling. Given a set of noisy samples of a signal with equally or non-uniformly distributed data values, the scale parameters or specific bandwidths are used to fit the samples locally by a polynomial using the least-squares (LS) criterion with a kernel (window) function. The bandwidth in LPR is closely related to the concept of scale in wavelet transform. The scale parameter in wavelet transform is usually constrained to be dyadic so that it forms a basis for expansion in the l 2 space, while the bandwidth in LPR is flexibly chosen to optimize the bias-variance tradeoff. The ICI rule with reducing bias in regression has proven to be excellent for such spatial adaptation for LPR in terms of asymptotic analysis [12,13].
Since there are various applications requiring perfection in signal acquisition, processing, and reconstruction, it is very crucial to denoise the signals to achieve the best bias-variance tradeoff in estimating the local polynomial coefficients for the reconstruction of unknown non-stationary signals. For timevarying and slow-varying parts of a signal, it is desirable to process it by using a large window length, so that the additive noise reduced to negligible values by averaging out the signal values. In contrast, for fast-varying parts of a signal, it is recommended to have small kernel window, so that the bias errors will be reduced due to using of the lower order of the fitting polynomial curves. Mostly, the analytical relation of locally optimal bandwidth can be easily derived. However, these formulae are not directly applicable in practice as they always involve quantities that are complicated to estimate. Accordingly, various empirical bandwidth selection methods to determine the optimal bandwidth from a finite set of given bandwidth parameters were proposed.
Fully adaptive data-driven local bandwidth of the signal was suggested by Fan in a series of publications [1,2]. Fan-Gijjbels' bandwidth selection (FGBS) method was based on the fact that the mean squared error (MSE) gets the minimum to optimal local bandwidth. The optimal bandwidth can be found with the smallest MSE results from a bandwidth set, i.e., a group of possible bandwidth values.
The empirical-bias bandwidth selection (EBBS) algorithm for multivariate LPR has been suggested by Ruppert [6][7][8]. These innovative and useful methods suffer from high complexity in implementation.
Lepskii et al. [12][13][14] have eliminated the estimation of too many asymptotic quantities in the FGBS method. Lepskii's approach compares within the set of bandwidths and chooses the optimal bandwidth as the largest estimate. The largest estimate of selected bandwidths is similar in performance with the estimate of smaller bandwidth in practice.
Goldenshluger and Nemirovski [15] and Katkovinik [16][17][18][19][20] have proposed an intersection of confidence intervals (ICI) method. The optimal bandwidth obtained by comparing the estimates of the confidence interval with different bandwidths in a set of values. The importance of this method is that there is no need to have the estimation of the asymptotic bias and MSE; hence, it results in lower arithmetic complexity.
Vieu [9], investigated kernel estimators of a regression function. The minimization of a local cross-validation criterion in data-driven method is employed for choosing bandwidths locally. The technique was shown to be optimal asymptotically in terms of local quadratic values of the errors.
Sheather and Jones [10] presented a data-based selection in the estimation of kernel density. The method chooses the bandwidth to minimize good quality signal estimates of the Mean Integrated Squared Error (MISE). The approach showed considerably reliability in performance.
Hall et al. [11] proposed a bandwidth selection technique which involves injecting the estimates into the usual asymptotic representation for the optimal bandwidth with two main modifications. Firstly, his work mainly focused on the estimation of kernel density, and secondly, the convergence of the optimal solution was improved using this method relative to the previous works.
Kai et al. [21] proposed a nonparametric regression mechanism termed "local composite quantile regression smoothing" to enhance the LPR. The authors derived normality, variance of the estimates and asymptotic bias. In this work, the asymptotic relative efficiency of the estimate according to the LPR was examined.
Zhang et al. [22][23][24] studied the problem of adaptive selection of kernels for multivariate LPR. The authors studied the applications of kernel selection in smoothing techniques and reconstruction of images containing noise. The resultant multivariate LPR technique is called the steering kernel-based LPR with Refined ICI technique (SK-LPR-RICI).
Papp et al. [25] considered the problem of optimizing the expected shortfall in the presence of a l 2 regularizer for uncorrelated Gaussian yields. The transition between the regularizer-dominated and data-dominated regimes is narrow. In this transition region, the tradeoff between the variance and the bias was balanced, such that there is domination in the estimation.
Chen et al. [26] presented a technique to select the optimal bandwidth for Kernel Density Functional Estimation (KDFE), where the optimal solution was searched by minimizing the MSE of the KDFE. Two methods of selection of bandwidth are proposed, namely the direct plug-in method and the normal scale method. The simulation results showed better performance in bimodal distributions.
Cheng et al. [27] introduced a nonparametric localized bandwidth estimator in which an asymptotic theory was established. The work can be further investigated in terms of the applicability of the localized bandwidth in the estimation of the kernel of nonparametric regression.
Other researchers use many different ideas or variations to propose various adaptive bandwidth selection methods for LPR. These innovative methods are useful, but the error and the complexity involved are usually high. Hence, an optimized bandwidth selection algorithm and the bias-variance tradeoff problem are discussed in detail. The method did not necessitate any explicit knowledge of bias estimation. In the proposed work, the bias and variance expressions derived from modified, optimized adaptive ICI rule are used for the selection of local bandwidth and used in the LPR.
The bias-variance tradeoff and the ICI adaptive bandwidth selection algorithm have been discussed by Stanković in [3], which have been used to get the optimum adaptive local fit bandwidth selection ICI rules. In our work, we use the bias and variance expressions derived from modified adaptive ICI rule for local bandwidth selection and it is used in LPR. We found the methods to be comparatively better in terms of mean square estimate (MSE), their performance, memory requirement, and implementation complexity to previous analysis.
In this paper, LPR is used for efficient recursive implementations and online signal processing. On the basis of memory requirements and delay time, it is shown that the ICI-based method is simpler to implement than the other methods. The outcomes of our research are: (1) ICI rule based on [3] that helps eliminate unwanted noisy signals. (2) The optimal values of smoothing bandwidth calculated are independent of the other values of the signal. (3) The pointwise smoothing obtained by the parameter 'h' is effectively calibrating the signal values to desired localized optimum data value. (4) The window size can be kept varying in every iteration as noisy values eliminated in every iteration. (5) There is no need to optimize the different values of the algorithms in iterations as it tends to generate the optimized 'h' from the algorithm. (6) Signal regrets optimizing value under limiting conditions because the variance of the signal directly helps modify the value based on curvature at the point. This occurs because of the optimum selection of variance and smoothing parameters at every data point.
The paper is structured as follows. In section II, we review the bias variance method. Section III indicates the modification and the procedure we adapted to get optimum adaptive values for the smoothing bandwidth as well as the steps involved in the calculating the results. Simulations on some standard signals are carried out, and they are presented in section IV with some tabulation, and section V concludes this paper. We begin with brief introduction of bias variance method [3].

Review of ICI rules [3]
Consider the unknown signal f (k) superimposed with stationary noise (k) Consider a quantity Q(k) we want to obtain from this noisy signal that is time-dependent. Let bias and variance depend on 'h' smoothing parameter. Since bias and variance are longitudinally related to each other and both are increasing or decreasing functions of each other or vice versa, the nature of bias and variance is assumed as where B(k) and V depending on f (k) are temporary values that cannot be known in advance. Here, n, m are integer values.
Eqs. (2) and (3) are unknown values and its form will be based on input values of data. Number of researchers used the form of the equations based on kernel parameters even though it is not known prior [6-8, 12, 14]. Instead of representing by such complicated assumed form, we start with simplest representation and dependence of these equations on unknown values of input signal values, and the relation between bias and variance (i.e., reciprocal forms) on smoothing bandwidth 'h'.
The mean square error (MSE) takes the following form: The minimum values of MSE are obtained by differentiating with respect to h at h h opt (k) and equating to zero.
For optimum value, multiplying by h and rearranging, we get the relationship between variance and bias as At h h opt , the bias to standard deviation ratio is independent of h. Now a set of H discrete values of parameter h will be considered as first approximation, as follows with a > 1 and h 0 > 0. The general idea is to find out accurate values of h opt , but the exact optimal choice h opt is not equal to any of the values contained in the set H. To relate h opt with the values in the set H, consider that h opt is close to a parameter h s+ and belonging to H,h s+ ∈ H, i.e.,h s+ ≈ h opt . Thus, we obtain h s+ a P h opt , where p is a constant closely to 0. Now, we can write h s+ or h opt as Thus, fine variations of a signal values in H are taken care by s and their coarse variation around the signal value of h is based on a P .
To get the relationship between two consecutive values, confidence intervals of the random variable are introduced. The confidence intervals have significance in the algorithm. The estimate is a random variable distributed around the true value y with bias(k, h s ) and standard deviationσ (h S ). Suppose that the probability of obtaining an unbiased estimate from the accurate values is κ.
where κ is selected in such a way that probability of κ tends to one, i.e., P(κ) → 1 so that, we get the true values.
The confidence intervals of the estimate for the parameter h S ∈ H are defined by D s [L s , U s ], and it allows the distribution function to vary by simple relation.
Here, Q(k) has an estimateQ s (k), which is obtained by parameters h h S , and σ (h S ).
To get the relationship between two consecutive values in H, confidence intervals of the random variable are introduced. Let us consider two consecutive intervals s − 1ands and their confidence intervals D s−1 and D s . When s << s + , negligible bias will be present [see Eq. (10)]; thus, Q(k)∈D s (with probability P(κ + κ)) → 1). Then, consecutive intersection of confidence interval has null set, i.e., D s−1 ∩ D s =0, hence at least the true value Q(k) will be part of both intervals. For s > > s + , the variance is negligible, and hence, the bias becomes a large quantity. As a result, for larger s, an intersection of confidence intervals exists as, D s−1 ∩ D s Ø; for finite (κ + κ).
As the algorithm suggested, we need to consider only two consecutive intervals of values of s.
When there is a positive bias, this condition means that the minimum positive value of upper bound, which is denoted by min{U s + −1 } is always greater than maximum the lower bound max{L s + }, i.e., min{U s + −1 } ≥ max{L s + } The condition that do not intersect is given by max{U s + } < min{L s + +1 } Equation (11) gives maximal and minimal values ofQ s (k) as follow, The above two inequalities result in following equations by substituting these values into (10), Because the worst case of existence is indicated by these inequalities, the algorithm parameters are evaluated by applying equalities in these equations. We get the parameters by using (14) and (15) κ 2κ The parameter κ in D s can be determined for two successive values of s, for which the sequence of pairs of the confidence intervals D s−1 and D s intersect is s s + . Then, when the following condition occurs, intervals D s−1 and D s will intersect: This is the important equation by which the algorithm calculates and differentiates the points by which the signal values are separated.
The a p is selected from the set of values proportional to h, and then, the next value of σ (h S ) is related by (κ + κ). Hence, Eq. (18) is updated for these new values. By this way, we get the lowest difference of values of the signals based on h.

Optimum adaptive bandwidth selection method
As the above optimizations indicated that varying values of m, n, and a will not affect the selection of the desired adaptive point-wise parameters. Hence, the values of these parameters are kept constant and the values of s and p are search out of the combination of the above equations. The κ is obtained from Eq. (17), even though it will be some burden on processing part of the algorithm, it will be more beneficial as we approach the near optimum of the limiting values of desired confidence intervals. Equation (18) is used to check the proper values of h. This value is the key parameter for the estimation of the pointwise bandwidth of the signal. Equation for the D indicating the upper and lower values of the confidence interval in ICI is derived by Eq. (12).
The κ determines the values of the consecutive separation of the points. The points that correspond to the same group, e.g., all the points on the flat portion of the curve show little variation in this parameter or value of the variance is constant for that portion of the curve. The points on the sharp varying slope of the curve show different values for κ. Hence, grouping of these points is very difficult as the portion of the resultant curve shows more departure from actual values of the curve. Therefore, the anisotropy of the curve will be difficult to adjust.
In selection of the data points in a group of particular dimension, the ICI rule is observed for every data point. The points that fall within the limit of ICI will only contribute to the points; otherwise, it is considered the outside values. This type of classification is observed in the kernel-based selection of the points in a group also. The points closely placed in a group are contributing the higher values to the weight of the kernel; other points far away from the center point have little contribution, provided they are under constrain of ICI rule.
Since the procedure to get the value of the smoothing factor 'h' is independent of the other data points, and it is based on the selection rule of Eq. (18) which is related to the error function, we obtain the value of the smoothing bandwidth with minimum optimal bandwidth at a location. Hence, our logic of selection of the parameters for optimum adaptive ICI rule will come out as follows.
The procedure to implement OALB-ICI-based method is as follows: 8. If the condition satisfied, use Eq. (12) for upper limit and lower limit.
As observed in the procedure, the method helps in reducing the error in the signal very rapidly. It provides higher gain to denoise the signal at every point. The process of denoising is based on grouping the points based on variance of the noise signals. The method gives optimized values in few iterations only and also notes that the value of the smoothing bandwidth is kept constant throughout the regression or it is multiple of a constant value. These multiplying values in various iteration are selected in such a way that it will reduce the noise.

Simulation results and comparisons
The local bandwidth and its use in reconstruction of original signal from noisy values by different kinds of methods have been evaluated in simulation. Four test signals are used, namely Blocks, Heavisine, Bumps, and Doppler, are simulated and they are given by [1,15], and [28]: Blocks: Heavisine: Bumps:

Variation in different algorithms
We consider the experiment in [23]. The signals with white Gaussian noise of unit variance (σ 2 1) and zero mean in the interval (0, 1] are considered here. The observation data are uniformly sampled with a data point of 1024; hence, the sampled length is 1024. The test signals and their corresponding noisy signals are plotted in Figs. 1a, b, 2a, b, 3a, b, 4a, and b, respectively. In our analysis, we do not need the threshold values but we compare our results with the threshold parameter by the other methods as in [23] with parameter κ selected as 2.58 so that it has 99% (Pr(2.58) 0.99) probability that Q will lie in the confidence interval [23]. To compare the results, we evaluated these different adaptive bandwidth selection methods (with abbreviations and reference number): FGBS, [1] and [2]: The method by the Fan (RSC-based bandwidth selector with L 8).
OALB-ICI: The proposed optimum adaptive local fit ICIbased bandwidth selector in this paper. Figures 1, 2, 3, 4 indicate the results of above adaptive bandwidth selectors for the four test signals. Even though these are with different techniques and criteria, their results are satisfactory. The most of the signals details are preserved in implementation by all these methods, while the additive noise is successfully suppressed even though the adaptive pointwise bandwidths of the algorithms, which indicate local characteristics of the observation data for the different methods, have different forms, and the same width of the data values is used in the results. Mostly flat areas of the signal have increased bandwidths in some of the results, so more data samples are required to be included to eliminate the additive noise. Sharp or abrupt changes will necessitate a small bandwidth to reduce the estimation bias. In the experiment, the proposed method's smoothing bandwidth ranged between 0and10 −8 . However, the adaptive bandwidths obtained by these methods have considerable differences. First, optimal bandwidths obtained by RICI are smaller than FGBS due to the refining operation.
As compared with ICI, pointwise bandwidths of FGBS are smoother because smoothing operations have been carried out for a series of subintervals while calculating the optimal bandwidth. Hence, at sudden changes like jump or step discontinuities have lower response in its characteristics as compared with the ICI-based methods, which independently compute the optimal bandwidth for each data point. Hence, for 'Block' signal, which has many jump discontinuities between the flat areas, this phenomenon is observed more in the result. The bandwidth selection in all ICI is automatically set to select small bandwidths around the jump discontinuities, while using large bandwidths for flat areas as shown in Figures. In OALB-ICI method, the algorithms test the values of the data points independent of successive values, so the 'h' under the condition with variance calculates optimized value.
As the procedure indicates, the value of κ is selected as a function of the noise variance and being kept at lower value, the term (κ + κ) will be adjusted itself and the optimum adaptive pointwise selection of the 'smoothing bandwidth' has been calculated as stated above. By this method, the noise variance is automatically taken into account at the point with explicitly used lower and upper bound of the confidence interval.  is used to measure quickly the performance and features for the four test signals are listed in Table 1.

Observation of parameter
The small value of κ results in the values of adaptive bandwidths as small bandwidths, and little improvements can be achieved by traditional as well as refining the conventional ICI-based adaptive bandwidths selector. When κ is selected as a large value, considerable improvements in MSE values can be achieved. In our proposed method, κ is made the function of noise variance itself, and thus, the bandwidth has shown significant effect depending on noise values.
In the method adopted to calculate the parameter, there is strong relationship among the calculation of various parameters. The κ is helpful in reducing the noise between the consecutive values. Also, its dependence on variance correlates this algorithm to reach optimum values at every point of data values and also this algorithm to reach optimum values as early as possible. Particularly in terms of performance, it provides large gains with selection of proper selection of variance at every point. The change in κ value essentially provides the relation among consecutive values and variance. For examples, an abrupt change in κ value also shows large change in variance values. Thus, it provides the strong relationship to the results in simplest ways. As observed, it provides better logarithmic or exponential characteristics to both phase and gain at the points. It is thus possible to have the better upper and lower values of the confidence interval t range around the mean values.
As the observations indicate, to keep the constant value of κ is not possible because there is always strong relation between subsequent positions of data values; there are only few places where one has to eliminate the correlation. As in wavelet transform adaptive to abrupt change,D s [L s , U s ] provides the way of successive adaptation to optimum values in the range because of its powerful ways to reach the nonlinear variable gain.
When comparing the performances of different methods, the FGBS method, traditional ICI, the refined ICI-based method [23], and our optimum adaptive local fit ICI method, the situation is not as clear as we expect. In Table 1, the optimal value of MSE varies from signals to signals. The Doppler signal with continuous point to point variations κ with smaller value is preferred so that variation in local bandwidth which is varying constantly can be represented. In this case, smaller bandwidths are required and it can be used to reduce the estimation bias. Instead of this, 'Heavisine' which has slow variations, a large value of κ should be preferred, which will lead to larger bandwidths to reduce the additive noise [23,24]. In our optimum adaptive bandwidth selection, we make the values of κ as a function of noise values; hence, for every points on the curve, the κ shows the variation and  While comparing the results of refined ICI methods where implementation has been carried out by a fully data-driven method for choosing κ adaptively, the refined ICI-based bandwidth selector showed varying results for the supported values of it. In the optimum adaptive method, it shows uniformity in the results for test signals with uniformity in the denoising for various values of the noise parameters. We can see that the optimum adaptive local fit ICI method can achieve a relatively good result and show uniformity in all the signal processing to get the MSE as expected for the results.

Recursive implementation
Using test signal 'Blocks' with a white Gaussian noise of zero mean and unit variance, the recursive implementation of LPR will be demonstrated. In the work by Z.G. Zhang [23], the bandwidth functions h RSC andh M SE are recursively estimated in the interval as in FGBS, while in the TICI method, the values are selected from threshold parameter list and keep varying accordingly. Similar to TICI, RICI used the h opt . The subintervals length L 8, in the FGBS method, and the threshold parameter κ 2.58 have been used. The ICImethods and the FGBS method using forgetting factors are, λ ICI 0.95, λ RSC λ MSE 0:95/L 0.12, respectively. The estimation window size is N w 32, and the forgetting factor is λ σ 1−(1/N w ) 0.97 in the recursive estimation of noise variance, Fig. 5 shows the FGBS, TICI, and RICI methods estimation results, and the corresponding local bandwidth functions for block signal. All these methods work satisfactorily, and the local bandwidth can be adjusted adaptively. The MSE values are 2.2265, 2.1165 and 1.0417 for the FGBS, TICI and RICI methods, respectively, while the MSE by OALB-ICI is 0.0140 and window length is kept constant throughout the recursion. Table 2 summarized the observations in experiments for these classes of bandwidth selectors.
In our method, we rely on the noisy data itself, and selection of the signal values at the point is independent of MSE values or the MSE of other data values as well. We are characterizing κ as a function of noise variance as given in summary of the procedure used to get the local bandwidth. Then, the Fig. 6 Denoising result of LPR with input SNR 10 dB for the signal Bumps by proposed OALB-ICI method used to filter is Nadaraya-Watson kernel regression, which is based on the known values of observation and categories as zero-order approximation to the multiple nonparametric regression function. The experiments show that it is more relevant and practical method in number of situations, including approximation, search, array formation for engineering work, etc. It is represented by the equation where z are expected values of denoised signal (estimated values), and K(·) is the kernel function based on difference of the data points to the centered location where the estimation is carried out. We used Gaussian form of equation.

Denoising example based on bumps and audio signal
The signal shown in Fig. 6 is corrupted by white Gaussian noise; the SNR is 10 dB. The corresponding denoising is shown in same Fig. 6. The denoised SNR and MSE are 23.04 dB and 0.01609, respectively. The length of the window is kept varying in the iterations in this case. This demonstrates that the method is appropriate for denoising of unknown signals. The audio signal is used to show the examples of timevarying signals shown in Fig. 7. This signal is corrupted by white Gaussian noise with SNR 10 dB. The corresponding Here, the length of the window is five successive samples, and it is denoising the signal without any iteration. This shows that the method is suitable for denoising of time-varying signal within the time constraints of real-time signal processing.

Algorithm reliability and complexity analysis
The algorithm reliability as studied by Stanković [3] suggested that the under limiting conditions probability of 'a false result' for limited distribution of error is zero for. |x| > ((κ + κ)σ (h S )) e. g., it is impossible to get a false result for uniformly distrusted error with (κ + κ) > √ 3. Gaussian distributed error gives the false result probability less than 0.0001, while heavy-tailed Laplacian error probability not more than 0.05 at a point.
Computation of Q s at a given point involves local sum of successive points in a window. The window around the centered value can be constructed to store the values. We stored the sufficient consecutive values and sum of corresponding window values. This computation can be achieved by known methods [29] in O n(log n) N −1 times, and the values of windows containing y can be retrieved in O n(logn) N time. Thus, estimated values in a window can be computed in O n(logn) N times after a preprocessing step in O n(logn) N −1 times. Here, we say that algorithm has complexity number of times the subsequent values taken into consideration with the complexity involve in Nadaraya-Watson method. The complexity for different work of Nadaraya-Watson has preprocessing complexityO n(logn) N −1 , storage complexity O(2 n ) and functional computational complexityO (logn) N .

Conclusions
Features of different adaptive bandwidth selection methods for LPR have been studied. Comparisons of these methods in terms of their performance and implementation complexity using test value sets have been carried out. A new optimum adaptive local fit ICI-based bandwidth selector and its recursive implementation are feature out. Simulation of the proposed optimum adaptive local fit ICI-based bandwidth selection method performance is considerably better than the other ICI methods. There are various applications of these studies such as interpolation of missing data values, testing of Patch antenna and PCB for various bandwidth, other applications involving image analysis and smoothing, and MRA of non-uniform data. The method suggested by us does not involve any complication or higher-order approximation; actually, it is simplest one and easily embedded in hardware and software fusion.