Formulating Intuitionistic Fuzzy Regression Models: A Mathematical Programming Approach

This study develops a mathematical programming approach to establish intuitionistic fuzzy regression models (IFRMs) by considering the randomness and fuzziness of intuitionistic fuzzy observations. In contrast to existing approaches, the IFRMs are established in terms of five ordinary regression models representing the components of the estimated triangular intuitionistic fuzzy response variable. The optimal parameters of the five ordinary regression models are determined by solving the proposed mathematical programming problem , which is linearized to make the resolution process efficient. Based on the concepts of randomness and fuzziness in the formulation processes, the proposed approach can improve on existing approaches’ weaknesses with establishing IFRMs, such as the limitation of symmetrical triangular membership (or non-membership) functions, the determination of parameter signs in the model, and the wide spread of the estimated responses. In addition, some numerical explanatory variables included in the intuitionistic fuzzy observations are also allowed in the proposed approach, even though it was developed for intuitionistic fuzzy observations. In contrast to existing approaches, the proposed approach is general and flexible in applications. Comparisons show that the proposed approach outperforms existing approaches in terms of similarity and distance measures.


Introduction
Regression analysis is a useful approach for establishing the relationship between a response variable and one or more explanatory variables. In real-world applications, observations usually cannot be represented as exact (or numerical) values due to a shortage of information or the decision-maker's subjective judgment (Chen and Hsueh 2007). Linguistic terms, defined and characterized by fuzzy set theories (Zadeh 1965), are thus often adopted to describe observations. For fuzzy observations, a number of researchers have established fuzzy regression models (FRMs) by developing various approaches to determine the optimal parameters in the models based on their performance criterion, such as Chen and Hsueh (2007), Choi and Buckley (2008), Kelkinnama and Taheri (2012), and others. These approaches have been applied to many fields by Chan and Engelke (2017), Spiliotis and Garrote (2021), Boukezzoula et al. (2021), and others. Although FRMs have numerous applications, some limitations have been found. For example, the fuzzy parameters in FRMs could lead to a wider spread of the estimated fuzzy response when the magnitude of the fuzzy (or crisp) explanatory variables increases (Kao and Chyu 2002). This significantly increases the unnecessary information in the estimated fuzzy response, possibly influencing FRM performance in some applications. This has motivated studies to build up FRMs with crisp parameters by applying mathematical programming (Chen and Hsueh 2007), a least-squares method (Chen and Hsueh 2009), or least absolute deviation (Choi and Buckley 2008), or to adopt shape-preserving operations (Kelkinnama and Taheri 2012) for generating narrower spreads through fuzzy arithmetic. In addition, the signs of parameters in FRMs should be considered in the formulation processes since they influence the arithmetic operations; however, almost all approaches presumed them to be positive.
By extending fuzzy set theorems (Zadeh 1965) that use membership degrees to characterize the relationship grades, Atanassov (1986) proposed intuitionistic fuzzy sets (IFSs), in which non-membership degrees are included. Observations described by IFSs can contain more information than those described by fuzzy sets. Membership and non-membership degrees are usually used to express positive and negative information, respectively, in decisionmaking problems. IFSs have been widely studied and applied in various problems (Atanassov 1999), such as in Hesamian et al. (2020), Ho et al. (2019), Yolcu et al. (2020), and so on. To the best of our knowledge, only a few approaches have been proposed for developing intuitionistic fuzzy regression models (IFRMs). Parvathi et al. (2013) extended the approaches for FRMs from Tanaka et al. (1982) to formulate an IFRM based on a linear programming problem to determine coefficients that are characterized as symmetrical triangular intuitionistic fuzzy numbers (TIFNs). In their study, observations contained numeric response and explanatory variables. By solving the linear programming problem, only the symmetrical TIFN parameter of the intercept term is obtained; the other parameters are always numerical due to the nature of their problem. The resulting parameters are used to determine the estimated responses, which are then defuzzified to delimit the upper and lower bounds of the estimated response variable. Arefi and Taheri (2015) proposed an IFRM based on a least-squares method where the response variables, explanatory variables, and parameters are all symmetrical TIFNs. They proposed a metric for calculating the squared errors between observed and estimated response variables to determine the symmetrical TIFN parameters. The explanatory variables and parameters were all assumed to be positive in their formulation processes; however, a negative parameter was produced in their example.
Recently, Chen and Nien (2020) extended Chen and Hsueh's (2007) approach to establish an IFRM based on the criterion of least absolute deviation. Although the problem of determining the parameter sign can be solved, their approach still suffers from the problem of producing wider spreads, since the intuitionistic fuzzy arithmetic is applied. Al-Qudaimi (2021) proposed an approach, named the Ishita approach, to build up an IFRM by formulating non-linear programming problems to determine the parameters of the IFRM. However, some unreasonable conditions are defined in the Ishita approach, since they violate the basic definitions of an IFN, leading to its resolution processes being problematic. In addition, the TIFN-parameters are also presumed to be positive in the IFRM formulation, while a negative slope TIFN-parameter is obtained in the example. This approach also has the problem that wider spreads could be produced.
Recognizing the above limitations of existing approaches for formulating IFRMs, such as producing wider spreads and the wrong setting of the sign of parameters, this study proposes mathematical programming approaches for establishing IFRMs that consider the randomness and fuzziness of TIFN observations, referring to the concepts of fuzzy random variables (Watada et al. 2009). Randomness is measured based on the most likely value (center) of a fuzzy number, and fuzziness is represented by the spreads of the fuzzy number. This study adopts these concepts to decompose the representations of intuitionistic fuzzy numbers (IFNs) into two critical parts to simplify the formulation processes. More importantly, the characteristics of IFNs can be considered in the IFRM formulations by individually expressing the five components of an IFN, i.e., the most likely value, the left/right spreads of the membership function, and those of the non-membership function. Taking randomness and fuzziness into account, a mathematical programming model that is restricted to five ordinary regression models representing the five components of the estimated TIFN response variable, respectively, is formulated to minimize the least absolute deviations to build up the optimal IFRM. The proposed approaches overcome the weaknesses of existing approaches, such as the determined IFRM having wider spreads and the problem of determining the sign of parameters, because the resulting IFRM is determined from the five ordinary regression models representing the five components of the estimated TIFN response variable, respectively, in the formulation processes.
The objective of this study is to develop a mathematical programming approach based on the least absolute deviations for formulating the optimal IFRM considering the randomness and fuzziness of intuitionistic fuzzy observations. The proposed approach overcomes the weaknesses of existing approaches by ensuring that the spreads of the resulting IFRM will not be wide and the problem of the sign of parameters does not require any concern. Furthermore, although the proposed approach focuses on the development of IFNs, it actually can deal with observations containing numerical, fuzzy, intuitionistic fuzzy, or mixed variable types. The rest of this paper is organized as follows. Some basic concepts of IFSs and related measures, such as the similarity measure, squared errors measure, and distance measure, are described briefly in Section 2. In Section 3, considering randomness and fuzziness, the optimal IFRM is formulated by solving the proposed mathematical programming model to minimize the least absolute deviations in terms of five ordinary regression models representing the five components of the estimated TIFN response variable, respectively. The proposed approaches are general, since the explanatory variables in the observations are not required to be symmetrical TIFNs; even numerical observations are allowed. Two examples are used to demonstrate performance via comparisons with existing approaches in Section 4. A dataset is used to demonstrate the applicability of the proposed model in Section 5. The comparison analyses with existing approaches are conducted, and the advantages of the proposed approach are explained in Section 6. Finally, Section 7 provides some conclusions.

Background
Intuitionistic fuzzy sets (IFSs), proposed by Atanassov (1986), are a generalization of fuzzy sets that includes the degrees of membership and non-membership. Definition 1 (Atanassov 1986;Xu and Yager 2006): Let X denote a universe of discourse. An IFS in X is given by: (1 ) min ( ), ( ) , Generally, the membership and non-membership functions of A % can be characterized in the following forms: Definition 4 (Guha and Chakraborty 2010): The α-cut of an IFS A % is defined as: The inequality and thus A  % can be expressed as the crisp sets  Fig. 3 shows that the two crisp sets   have the same intervals. To be compatible with a previous study (Arefi and Taheri 2015), Based on this definition, the α-cuts of a TIFN A % can be formulated in the following general form: The two extreme cases are: Definition 5 (Chakraborty et al. 2014 A useful measure of IFNs is necessary to evaluate the performance of an IFRM. For this purpose, Arefi and Taheri (2015) proposed two measures, namely a similarity measure based on the absolute difference of membership and nonmembership degrees of two TIFNs and a squared difference measure, as defined below. Definition 6 (Arefi and Taheri with the value of 0 . This index is measured in terms of the area of the average difference between the membership and non-membership functions of two IFNs. However, the degree of difference cannot be determined if the two IFNs have no intersections. Definition 7 (Arefi and Taheri 2015) Another distance measure based on the absolute difference of α-cuts, shown below, is also adopted in this study.
where p is a parameter and 1 p    . The measure ( , ) is determined by calculating the integral of the average absolute difference of all α-cuts based on the parameter p, which is usually set to 1. Furthermore, the three measures mentioned above, i.e., the similarity measure, squared difference measure, and distance measure, all satisfy their respective axioms, as proven in previous studies (Arefi and Taheri 2015;Grzegorzewski 2003).

Intuitionistic fuzzy regression modeling
The formulation processes of an IFRM are introduced in this section. An IFRM usually suffers from the problems of determining the sign of parameters in the formulation processes and producing wider spreads in the estimations, thus influencing IFRM performance. To avoid these problems, the proposed approach considers that randomness and fuzziness are the two critical factors that affect IFRM performance. To deal with randomness and fuzziness, the formulation of IFRMs is carried out by directly determining the five components of the estimated TIFN response, where each component is formulated as an ordinary regression function of the corresponding ones of the observed TIFN explanatory variables. Mathematical programming problems are formulated to determine the optimal numerical parameters of each component of the estimated TIFN response to minimize the total difference between the observed and estimated TIFN responses.
Regarding the randomness and fuzziness of a fuzzy number (Fong and Hong 2010), the most likely value can be expressed as a random variable and the left and right spreads can be used to express fuzziness. In this study, for a TIFN To evaluate the differences between the observed and estimated TIFN responses in terms of randomness and fuzziness, the α-cuts of TIFNs are adopted, since they can easily be used to measure the distance between two membership (nonmembership) functions at each α level.
Let a TIFN observation set be denoted as ss. Based on a previous study that presented a distance measure for fuzzy numbers (Chen and Hsueh 2007), the difference between the ith observed and estimated TIFN responses, i D , can be determined from the average deviations of r corresponding α-cuts as: where Based on Definition 4, the first part of i D can be reformulated as: The above equation is derived from the absolute differences of the most likely value and the left spread of membership functions between the observed and estimated TIFN responses, i.e., the differences of the randomness . Two TIFNs are thus equal if there are no differences in randomness and fuzziness between them. This also indicates that an IFRM with better explanatory power will produce a smaller difference compared with the observed responses. To establish such an IFRM, each component of the estimated TIFN responses should be able to be explained by the corresponding ones in the observed TIFN explanatory variables. To this end, each component of the estimated TIFN responses can be formulated by an ordinary regression model as follows: where is the numerical estimated parameter vector for the jth TIFN explanatory variable.
To minimize the overall difference between the observed and estimated TIFN responses, the optimal estimated parameters of TIFN explanatory variables should be determined based on Eqs. (15) The determination of these parameters should guarantee that the estimated responses satisfy the definition of a TIFN. That is, the spreads should all be greater than or equal to zero, and ˆî After solving model (25), the optimal estimated parameters for the components of the estimated TIFN responses can be obtained to minimize the absolute deviations with the corresponding ones of the observed TIFN responses. In the proposed formulation processes, the IFRM is constructed by determining the optimal components of the estimated TIFN responses using the corresponding ones of the observed TIFN explanatory variables. That is, the optimal IFRM is established in terms of five ordinary regression models, considering randomness and fuzziness. The product operator of TIFNs is avoided in the model formulations to prevent the generation of wider spreads. In addition, the crisp parameters in model (25) are formulated for the components of the estimated TIFN responses to avoid the problem of the signs of the parameters in IFRMs, so as to decrease the impact on forecasting performance. Particularly, even if the observation set contains some numerical explanatory variables in which spreads are missing, the proposed approach can still be applied to establish the optimal IFRM. Furthermore, the objective function in model (25) contains absolute deviations, formulated as a nonlinear function. To increase computational efficiency, some linear techniques (Kelkinnama and Taheri 2012) can be adopted to reformulate the model. For example, for two non-negative deviation variables, Similarly, other non-zero positive and negative deviation variables can also be defined, such as The reformulated linear model (31) is a general linear programming model, and can thus be easily solved to determine the optimal parameters as ˆˆ[ , , VR jj bb, j 0, 1, …, p, using commercial software, such as LINGO (Anderson et al. 2017). To formulate and solve this model, only the most likely value and left/right spread of the membership/non-membership function from the IFN dataset are required. In addition, when the number of TIFN explanatory variables in an intuitionistic fuzzy dataset increases, only the numbers of corresponding components in the first five constraints increase. The solving process is quite efficient, even for datasets with a large quantity of observations and/or explanatory variables, because model (31) is linear.

Demonstration and comparisons
This study builds an IFRM in terms of five ordinary linear regression models based on the linear relations between components in the estimated TIFN response variable and the corresponding ones in the TIFN explanatory variables, considering randomness and fuzziness. For demonstration and comparisons, examples from Al-Qudaimi (2021) and Arefi and Taheri (2015) are adopted in this section.
Example 1 The example from Al-Qudaimi (2021) contains four patterns with one explanatory variable. Let Y % be the volume (given in liters) of one mole of methane gas under a constant pressure of one atm, and X % be the temperature in Celsius (℃). They are represented as symmetrical TIFNs and listed in Table 1. Al-Qudaimi (2021) (32) and (33) in terms of similarity (10) and distance measures (11) and (12). Obviously, the proposed IFRM (33) outperforms model (32) based on the three measures.
Example 2 This example adopts a dataset with 24 symmetrical TIFN observations (Arefi and Taheri 2015), as listed in Table 3. For each observation, the response variable, namely the cation exchange capacity ( Y % ), and two explanatory variables, namely the sand content percentage ( 1 X % ) and organic matter content ( 2 X % ), are used. Arefi and Taheri (2015) developed an IFRM approach based on the criterion of the least squared errors using the observations with symmetrical TIFN response and explanatory variables. To demonstrate their approach, the dataset, listed as .
As is the case in the previous example, the resulting optimal parameters for the right and left spreads of the membership/non-membership function are also identical, since observations in this example are also symmetrical TIFNs. Then, the IFRM, ˆC N Y % , can be established as follows: The results of performance comparisons among the IFRM (34) from Arefi and Taheri's (2015) approach, the IFRM (35) from Chen and Nien's (2020) approach, and the IFRM (36) based on the proposed formulation, in terms of the three performance measures, are listed in Table 4. The proposed IFRM (36) outperforms models (34) and (35). The total similarity SM from (36) is 11.8053, which is higher than  11.3919 from (34) by 3.62%, and higher than 11.3640 from (35) by 3.88%. In addition, the proposed approach's distance measure 2 d is lower than that for Arefi and Taheri's (2015) approach by 9.7607 (13.94%), and lower than that for Chen and Nien's (2020) approach by 0.5144 (8.47%). As for the distance measure, 1 p D  , the proposed approach is lower than Arefi and Taheri's (2015) approach by 8.5502 (23.44%), and lower than Chen and Nien's (2020) approach by 0.6351 (2.22%).
The four kinds of TIFN response, Y % , LSM Y % , LAD Y % , and CN Y % can also be graphically compared. As shown in Fig. 4, and CN Y % for all estimated TIFN observations are almost equivalent; however, the LSM Y % produced the widest spreads in three approaches, and CN Y % has the smallest spreads among three IFRMs. The results in Table 4 and Fig. 4 show that the proposed approach significantly outperforms the existing approach in terms of the similarity measure and the two distance measures.

Application
A dataset (Freund and Wilson 2003) is adopted in this section to demonstrate the applicability of the proposed models. The collected dataset contains 47 observations for estimating the weight of a tree. A number of trees were measured before they were harvested based on four indices, namely the diameter at breast height (DBH) ( 1 X ), height ( 2 X ), age ( 3 X ), and specific gravity of the wood (GRA)) ( 4 X ) . Subsequently, the trees were harvested and their weights ( Y ) were measured. Although the first two indices (DBH and height) can be easily measured, they are not precise since the trees usually have irregular shapes. GRA) is roughly estimated, and tree age can be exactly known.
Suppose that three indices and the weight in the dataset have measurement errors. Let the four indices be explanatory variables, and the weight be the response variable. For demonstration, TIFNs are used to characterize the three explanatory variables, 1 X % , 2 X % , and 4 X % , and the response variable, Y % . The TIFNs are defined as follows. Let the original data be the most likely value (mean value) of the TIFN, and the left and right membership spreads of 1 X % , 2 X % , 4 X % , and Y % be randomly distributed from 8% to 10%, 8% -10%, 10% -12%, and 8% -10% of the corresponding mean value, respectively. Similarly, let the left and right nonmembership spreads be randomly distributed from 10% to 12%, 10% -12%, 12% -14%, and 10% -12% of the corresponding mean value, respectively. The fuzzified data are listed in Table 5. Applying the proposed approach, the IFRM is built in terms of five ordinary regression models as:    are zero, since ages are numeric in the dataset. In addition, the resulting optimal parameters for the right and left spreads of the membership/non-membership function for the other three explanatory variables are not the same, since TIFN observations in this example are not symmetrical. The performance analyses of this application are presented in the following section.

Discussion
From the above demonstrations and applications, this section provides more explanations and introduces some further investigations. In Example 1, Al-Qudaimi's (2021) approach is adopted to construct an IFRM with a negative TIFN explanatory variable. This approach derives the formulation processes of IFRMs, considering that TIFN explanatory variables may be negative, or their membership and/or nonmembership functions are across zero in the domain variable.
However, some unreasonable conditions are defined in the formulation processes, since they violate the basic definitions of an IFN so that the formulated IFRMs are problematic. The formulations are carried out by non-linear programming problems to determine the parameters of the IFRM, by which the resolution efficiency is influenced. In addition, this approach has the problem that wider spreads could be produced. Using the dataset from Al-Qudaimi(2021), the proposed approach is also applied to formulate an IFRM by five ordinary regression models, even for negative TIFN explanatory variables. Based on the performance comparisons, the proposed approach is much better than Al-Qudaimi's (2021) approach.
In Example 2, the parameter sign of 1 X % in the IFRM from Arefi and Taheri's (2015) approach is negative; however, their formulation processes are derived based on the assumption that all parameters are positive TIFNs. Chen and Nien's (2020) approach and the proposed approach are not delimited by such a restriction. Furthermore, in this example, the spreads of the TIFN response and explanatory variables of the observations are all symmetrical TIFNs. They are defined as a proportion of the corresponding mean value so that mathematically the spread can be considered as a function of the corresponding mean value. Consequently, the parameters of the estimated spreads in Eq. (36) are also proportional to the corresponding ones of the estimated mean value. For example, for ˆM L y s , the following relations can be obtained: , is needed in the formulation processes, if the spreads are set as above. However, if the spreads of the observed TIFN response and explanatory variables are not set as above, as is the general case in real-world applications, the proposed approach can still be applied (the above proportional relations will not arise). In contrast, if the observations are not symmetrical TIFNs, the method proposed by Arefi and Taheri (2015) cannot be employed.
From Fig. 4, with smaller spreads, the coverage of the estimated TIFN responses by the proposed approach better matches that of the observed ones, and thus the distances between them are smaller than those from the other two  Table 4. Evidently, this indicates that, individually considering fuzziness and randomness of IFNs, the proposed approach can significantly improve the estimation performance, since the wider spreads will not be produced.
In the application case, the TIFN response and explanatory variables are not symmetrical, so Arefi and Taheri's approach (2015) cannot be employed. By applying the proposed approach, the estimated TIFN response variable can be determined to be asymmetrical. The total absolute distance measure ( 1 p D  ) between the observed and estimated weights is 4027.08 in this case, which is quite close to the value of 4012.83 obtained using the traditional regression method with the mean value of the dataset. Particularly, the explanatory variable, age, is numeric in the dataset so that only randomness exists for this variable. By applying the proposed approach, fuzziness characterized by spreads of this variable in the four ordinary regression models are ignored, as shown in (37), so that this variable does not add the fuzziness to the estimated TIFN response, i.e., the spread is not increased from this variable. This shows that the proposed approach can effectively be applied to estimation purposes for TIFN or mixed-type observations in uncertain environments.
In sum, some important features of the proposed approach can be briefly described as follows. With the consideration of fuzziness and randomness of TIFN observations, the proposed approach formulates IFRMs by five ordinary regression models to characterize the five components of the estimated TIFN response. A mathematical programming problem is formulated to determine the optimal parameters of the five ordinary regression models. The resulting parameters are numerical, and the five components of the estimated TIFN response are directly determined using those parameters. Such formulation processes have some advantages in that they improve on the weaknesses of existing approaches as follows. Firstly, the wide spreads are not produced in the estimated TIFN response, since intuitionistic fuzzy arithmetic is not used and the spreads are individually determined. In contrast to existing approaches, the parameter signs of IFRMs are not a concern in the formulation processes. The proposed approach can deal with various data types of observations, such as asymmetrical IFNs, negative IFNs, or even numerical numbers. In addition to the performance in estimation purposes, the proposed approach is general and flexible in real-world cases with uncertain information.

Conclusion
With the consideration of fuzziness and randomness of TIFN observations, this study applied mathematical programming approaches based on the criterion of the least absolute deviations to build IFRMs in terms of five ordinary regression models representing the components of the estimated TIFN response variable, respectively. Each component of the estimated TIFN response variable is determined by the corresponding ordinary regression model. There are many advantages with the proposed approach. Unlike an existing approach, the observations used in this study are not limited to symmetrical TIFNs. The randomness and fuzziness of TIFN response and explanatory variables are included in the formulation processes so that each component of the estimated TIFN response variable can be individually estimated, subject to the definition of a TIFN, thus avoiding wider spreads that may include unnecessary information. This formulation process also solves the problem of determining the parameter signs in IFRM approaches beforehand, since the proposed approach applies mathematical programming to estimate each component of the estimated TIFN response variable. In particular, besides the TIFN explanatory variables, observations that include numeric variables are also allowed in the proposed approach, increasing its applicability. Demonstrations and comparisons showed that the proposed approach outperforms existing approaches in terms of similarity and distance measures. In future research, different kinds of regression models, not limited to linear models, representing the components of the estimated TIFN response variable could be investigated for possibly enhancing the model performance.