A Fuzzy Linear Regression Model With Functional Predictors And Fuzzy Responses

A novel functional regression model was introduced, where the predictor was a curve linked to a scalar fuzzy response variable. An absolute error-based penalized method with SCAD loss function was proposed to evaluate the unknown components of the model. For this purpose, a concept of fuzzy-valued function was developed and discussed. Then, a fuzzy large number notion was proposed to estimate the fuzzy-valued function. Some common goodness-of-ﬁt criteria were also used to examine the performance of the proposed method. Eﬃciency of the proposed method was then evaluated through two numerical examples, including a simulation study and an applied example in the scope of watershed management. The proposed method was also compared with several common fuzzy regression models in cases where the functional data was converted to scalar ones.


Introduction
As the most basic and commonly used statistical technique, multiple regression analysis is utilized to estimate the relationships between one or more predictors (independent variables) and a response (dependent variable). Recently, many techniques have been proposed by different authors to combine the conventional statistical regression models with the concept of fuzzy set theory. In this regard, Chukhrova and Johannssen [1] provided a comprehensive systematic review of thenavailable methodologies and applications focused on fuzzy regression analysis as of 2019. Such studies can be classified as (1) possibilistic approaches, where linear and non-linear programming methods are minimized by minimizing the total spread of their fuzzy parameters, subject to the support observations at some specific levels (see for example Refs. [2,3,4,5,6,7,8,9,10,11,12,13,14]), (2) fuzzy least squares and fuzzy least absolutes parametric/non-parametric methods, where the gap between the predicted fuzzy values and available fuzzy data is minimized with regard to various distance measures between two fuzzy numbers, covering the most commonly used linear and non-linear models (see for instance Refs. [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]), and (3) machine learning techniques, like evolutionary algorithms [30,31,32,33,34], support vector machines [35,36,37,38], and neural networks embedded in fuzzy regression analysis [39,40,41,42,43], where the ideas and terminology relevant to biological evolution are used, such as mutation, recombination, reproduction and selection. Here the candidate solutions of the optimization problem represent individuals in a population. Accordingly, a fitness function is used to determine the quality of some solutions of the optimization problem with individuals in the underlying population. Cheng and Lee [44] investigated the two most basic non-parametric regression techniques, namely k-nearest neighbor smoothing and kernel smoothing, for a problem with crisp input and fuzzy output. They further formulated an algorithm to select the best smoothing parameters based on minimization of cross-validation criteria. Wang et al. [45] proposed a fuzzy non-parametric model with crisp input and LRfuzzy output based on the local linear smoothing technique with a cross-validation procedure to select the optimal value of the smoothing parameter to fit the model. Additionally, Hesamian and Akbari [47] and Yang and Yin [47] proposed some fuzzy multiple regression model with fuzzy varying coefficients based on exact predictors and fuzzy responses.
All of the above mentioned fuzzy regression models relied on non-functional data. However, functional regression analysis [48] has received considerable attentions in various fields of application [49,50,51]. The basic idea behind functional regression analysis is to express each predictor in a repeatedly measured set of data as a smooth function and then draw information from the collection of the functional data. The term "functional" data traditionally refers to the data measured over an interval or a higher dimensional domain. Such data is recorded at discrete times to form a continuous function in order to (1) allow record evaluation at any point in time, (2) evaluate rates of change, (3) reduce noise, and (4) allow registration onto a common time-scale. From another pont of view, regression models with functional data can be classified into three classes: those with (1) functional predictor(s) and scalar response [52,53], (2) scalar predictor(s) and functional response [54,55,48,49,56,57,58,59,60,61] and (3) functional predictor(s) and functional response [62,63,64]. Many of these methods are direct extensions of the classical least squares, principal component, and partial least-squares procedures.
Previous studies on fuzzy regression analysis have been conducted on the basis of non-functional scalar/fuzzy quantities with exact/fuzzy or exact/fuzzy-valued (varying) coefficients. In this paper, however, a fuzzy functional linear regression modeling strategy is proposed based on functional predictors and a LR-fuzzy response and fuzzy varying coefficients. For many experts, a simple way to capture imprecision in a vague process is to express that as an LR-fuzzy number. Therefore, the LR-fuzzy numbers play an important role in many real-life applications of fuzzy inferences. To evaluate the unknown components of the proposed fuzzy functional coefficients, a criteria selection model is herein proposed via absolute error regularization and SCAD penalty.
The rest of this paper is organized as follows: Section 2 reviews some general concepts relevant to the fuzzy numbers. In Section 3, a methodology is proposed to estimate the fuzzy coefficients of a fuzzy regression model with functional predictors and fuzzy responses. A hybrid algorithm is also represented to evaluate the components of the proposed fuzzy functional regression model. Section 4 presents three numerical examples to evaluate the performance of the proposed method compared to other fuzzy multiple/non-linear/non-parametric regression methods in terms of some common performance measures. Finally, the main contributions of this paper are summarized in Section 5.

Fuzzy numbers
This section reviews some basic definitions of fuzzy numbers based on [65,66]. A fuzzy set A of R (the real line) is defined by its membership function µ A : R → [0, 1]. In addition, a fuzzy set A of R is called a fuzzy number (FN) if it is normal, i.e. there is a unique x * A ∈ R so that µ A (x * A ) = 1; and for every α ∈ [0, 1], the set The set of all fuzzy numbers is denoted by F(R). It is worth noting that fuzzy numbers are approximate assessments, given by experts and accepted by decisionmakers when access to more accurate values is either impossible or unnecessary. To simplify the fuzzy numbers representation and handling, several authors have captured the information contained in a (unimodal) fuzzy number using a functional parametric form known as LR-fuzzy number A = (a; l a , r a ) LR . The membership function of a LR-fuzzy number A is defined by: where a ∈ R, l a > 0 and r a > 0 are called the mean value, left and right spreads of A, respectively. The shape function L (or R) is a decreasing function from R + → [0, 1] such that An LR-number has been applied in various problems as a general model function of imprecision. In this paper, we employed the most commonly used LR-fuzzy numbers (with L(x) = R(x) = max{0, 1 − x}) so-called triangular fuzzy numbers (TFNs), to handle the imprecision in data set during numerical evaluations. The membership function of a triangular fuzzy number, denoted by A = (a; l a , r a ) T , is given by: (2) Definition 2.1.
[65] For a given A ∈ F(R), the mapping A α : [0, 1] → R is called α-values of A defined as follows: where A L [α] and A U [α] denote the lower and upper limits of α-cuts of A.
For instance, 1. If A = (a; l a , r a ) T is a triangular fuzzy number, then: , then: Remark 2.1. Since A α is a decreasing function of α, the relationship between αvalues and α-cuts can be expressed as: 1] , the membership function of A can be evaluated as follows: In addition, for all A, B ∈ F(R), λ ∈ R and α ∈ [0, 1], the addition and scalar multiplication operations between A and B can be evaluated as follows: Such arithmetic operations will be applied to suggest a fuzzy multivariate regression model in next section.
Definition 2.2. An L p distance measure between two fuzzy numbers FNs A and B is defined as Any three FNs A, B and C satisfy the following conditions: It should be noted that g(α) also modifies the square error distance between the two FNs A and B since it focuses on the values near the centers rather than tails. This distance measure is used to evaluate the unknown components of the proposed fuzzy functional regression model and performances of the proposed fuzzy regression model compared to other fuzzy regression models.
] is defined to be a fuzzy number A with the following α-values: where A = Here a notion of large numbers is developed for a fuzzy-valued function of A(t).
Proof 2.2. According to Definition 2.2, first note that: By strong law of large numbers [67], we know that for a large value of N.

Fuzzy functional linear regression model
Functional data analysis is a fast evolving branch of applied statistics, with the functional regression becoming popular in recent years. In this section, a functional linear regression model with fuzzy functional predictors, fuzzy responses and fuzzy functional coefficients was developed. Denoting the observed data on n statistical units by ( y i , x i (.) = (x i1 (.), x i2 (.), . . . , x ip (.)) ⊤ ), consider the following fuzzy functional linear regression model: where 1. y i = (y i ; l y i , r y i ) L i R i represent fuzzy responses, 2. α denotes the unknown fuzzy intercept, 3. β j (t) = (β j (t); l β j (t) , r β j (t) ) L j R j are the coefficients of the true fuzzy-valued function , and 4. ǫ i indicates a fuzzy error term.
Based on the fuzzy law of large numbers in fuzzy domain (Lemma 2.1), the fuzzy functional linear regression model (5) can be converted to a conventional fuzzy linear regression model: where U 1 , U 2 , . . . , U N are independent random variables uniformly distributed over the interval [a, b] and N ∈ N is a large number.

Estimation of unknown fuzzy coefficients and tuning constant
In order to estimate the fuzzy coefficients of the proposed fuzzy functional regression model (6), a regularization criterion that was originally presnted based on SCAD penalty was extended for the reduced fuzzy multivariate regression model (6), as follows: where M β jk = max{β jk , l β jk , r β jk }, According to the proposed fuzzy regression model, the unknown regression coefficients ( α, B) λ and constant tuning parameter λ should be simultaneously estimated based on a set of observed values ( y 1 , x ⊤ 1 (.)), ..., ( y n , x ⊤ n (.)). Since ( α, B) λ and λ were relaited on one another, beside the optimization problem given in Eq. (10)), the constant tuning parameter can be also evaluated by minimizing the cross validation criterion [68], i.e. (λ ( α,B) ) opt = arg min λ>0 CV (λ) where: in which For this purpose, the values ( α, B) λ was computed for many values of λ, looking for an optimal λ opt value that minimizes the one-out cross validation error (CV ). Once found, the optimal value of ( α, B) was presented by ( α opt , B opt ). To this end, the Mathematica software [69] was employed.
Remark 3.2. To conduct a comparative study with other fuzzy regression models, four widely-used performance criteria for evaluation of fuzzy regression models were used [15]. These included: 1. Root mean square error (RMSE): 2. Mean absolute relative error (MARE): 3. Mean similarity measure (MSM): where

Area Under the Receiver Operating Characteristic Curve (AUROCC):
where In addition, to examine the relationship between y and ŷ based on their scatter plots, the fuzzy response ( y) and the corresponding estimated value ( y) were converted to defuzzified to M y and M ŷ , respectively, according to the sugeno criteria [70]: Remark 3.3. It should be pointed out that the classical functional regression methods rely on some regularization basis functions such as B-spline. However, applying such methods in fuzzy domain, need more parameters in the proposed estimation procedure. However, by introducing the concept of fuzzy large number, such procedure was reduced to minimum parameters as much as possible.

Application examples
The feasibility and effectiveness of the proposed fuzzy functional regression model were examined based on the performance measures explained in Remark 3.2.
Example 4.1. (A simulation study) A set of m = 10 simulated data set with size of n = 300 were generated according to the following fuzzy functional regression model:  3. x i1 (t) = t(1 − t) 1.5 z i1 + w i1 where z i1 ∼ N(0, 0.1) and w i1 ∼ N(0, 0.9), 5. x i3 (t) = exp(−t) cos(4πt + 0.5)z i3 + w i3 where z i3 ∼ N(0, 0.3) and w i3 ∼ N(0, 0.7), Fig. 2. The mean values of the performance measures of the proposed method are reported in Table 1. In particular, consider the performance of the proposed fuzzy functional regression model for the 5 th simulated data set (as shown in Table 2 and Fig. 3). The results indicate that α = (0.91; 0.15, 0.36) T , with the performance of the proposed method also examined by comparing the defuzzified values of M y and M y .   Example 4.2. Prediction of suspended sediment load in a catchments area is very important as it can be used to evaluate the extent of the damage occurred in the catchment, the erosion hazard, and water management. In this example, prediction of the (annual) suspended sediment discharge (ton) based on stream water discharge (m 3 per day) of the Beheshtabad River (Chaharmahal and Bakhtiari Province, Iran) using the proposed fuzzy functional regression model is expected. Cutting through Beheshtabad Village, the river covers an area of 3866 m 2 (located between 31 • 28 ′ N and 32 • 56 ′ N latitude and 50 • 36 ′ E and 51 • 45 ′ E longitudes). This is an important stream as it supplies water for agricultural activities, fish farms, hydroelectric power plants, and drinking uses, making it important to monitor the suspended sediment load of this river. We hereby assume that the suspended sediment load is an imprecise quantity that can be expressed as symmetric TFNs. The values of suspended  sediment load ( y i = (y i ; l y i ) T ) and stream water discharge x i (t), t = 1, 2, . . . , 93 (in summer, 93 days, as per Solar Hijri calendar) were collected during 2000-2019.
The time series plots of x i (t) were smoothed via non-parametric kernel fitting [77] by Minitab software. For each year, a nonlinear regression model was considered: in which h is a bandwidth and K is a kernel. In this regard, the so-called triweight kernel was utilized. The cross-validation criterion was also employed to evaluate the optimal value of h. The plots of x i (t) and their smoothed functions (as well as y i = (y i ; l y i ) T ) are given in Figs. 5-8. Consider the following univariate fuzzy functional regression model: According to the proposed method, this gives the following fuzzy linear regression: where β(U k ) generates TFNs, U 1 , U 2 , . . . , U N are independent random variables uniformly distributed over the interval [1,93], and N ∈ N is a large number. Here, it was assumed that N = 100. The results of performance evaluations are summarized in Table 3. In addition, Fig. 9 presents the 3D-plot of β(t). In order to evaluate the effect of functional predictors on the fuzzy response, the functional data was converted to scalar values (mean values over summer). Then, the proposed method was compared with some common fuzzy regression models ( where β ′ = 93 0 ( β(t))dt. The results of some common fuzzy univariate linear/nonlinear regression models are summarized in Table 3. A comparison among different methods indicates that the proposed method in this study led to more accurate results in terms of MSM = 0.70, MARE = 11.682, RMSE = 9.22 and AUROCC 1 5 = 10.04. The accuracy of the proposed method along with other ones were also examined by comparing the corresponding M y and M y values, as shown in Figs. 10-11, further confirming the superiority of the hereby presented method over the other methods for this simulation example. Therefore, incorporation of functional data into a fuzzy regression model is expected to lead to more accurate performance measures compared to the conventional fuzzy regression models with scalar data.

Conclusion
Functional regression models are used to evaluate the complex relationship between repeatedly measured variables. In this paper, a regression model was built  for a functional fuzzy response where the predictors were functions. To this end, the concept of fuzzy integral of a fuzzy-valued function was first defined. Then, a fuzzy estimated value of the fuzzy integral of the fuzzy-valued function was proposed using the large numbers theorem. Then a regularization technique was adopted with absolute error deviation, SCAD penalty, and cross-validation criteria to evaluate the coefficients and tuning constant of the fuzzy-valued function. The proposed regression model was subsequently examined according to several goodness-of-fit criteria via an applied example and a simulation study. The results were compared to those of some common fuzzy linear regression models in cases where the functional data was reduced to exact values. The findings clearly indicated the higher efficiency of the proposed method in this research over other techniques. The proposed method can be applied for virtually any kind of LR-fuzzy response. Further research works may focus on extending the proposed model to the the cases where the predictors are also fuzzy-valued functions. A sensitivity analysis with respect to outliers can represent another potential topic for further studies.