Interval data aggregation technology for large scale decision making

China Abstract In this paper, an improved multiple attribute decision making (MADM) method based on the proposed novel score function and accuracy function of interval-valued intuitionistic fuzzy numbers (IVIFNs) is proposed to aggregate large-scale data. The attribute values in the decision matrices provided by each decision-maker (DM), which are characterized by interval numbers. First, a transformation matrix is introduced to define the concepts of satisfactory set, un-satisfactory set and uncertainty set of alternatives. An approach is then developed for aggregating attribute values into IVIFNs, and we will obtain the collective evaluation of each alternative. Next, using the interval-valued intuitionistic fuzzy weighted averaging operator, the collective attribute values characterized by IVIFNs are aggregated to get the overall evaluation of alternatives. The score function and accuracy function are applied to calculate the score degree and the rank of each alternative. Finally, a large-scale example is given to verify the validity of the reported


Introduction
The explosive increase of information has become an important turning point for decision analysis. Lots of previously inaccessible data is readily available in today's digital era, and some seemingly unrelated data can be of great value when combined. People desire to obtain some regular knowledge from the uncertainty to support decision-making. However, there are many uncertainties in the actual large-scale data environment, and the collected data are often noisy, inaccurate, or even incomplete.
Fuzzy set [1] is a powerful mathematical tool for dealing with uncertainty in the decisionmaking environment. Its effectiveness has been proven in various application fields, and it is one of the research hotspots in intelligent computing theory and its applications [2][3][4]. Atanassov [5,6] introduced the intuitionistic fuzzy set(IFS) [5,7], which is a generalization of fuzzy set [1] and more suitable for dealing with fuzziness and uncertainty than the ordinary fuzzy set (In fact, it Because of Xu's ranking method's flaws, many scholars have done a lot of research on the score function and the accuracy function. For example, to overcome the drawback of Xu's accuracy function, Ye [11] proposed an improved accuracy as follows:  However, the proposed techniques for ranking IVIFNs using a score function, the scope of solutions is limited. For this reason, Yao [13] proposed a new score function to overcome this problem. Xu's score and accuracy function are only based on the membership and non-membership intervals of the IVIFNs. Considering the information of the hesitation interval of the IVIFNs, Wang [14] proposed a new score function as follows: where     Gao [15] proposed a decision analysis method based on prospect theory, which depends on a new score function to convert IVIFNs into real numbers. The proposed score function is shown as:  the correct order between 1   and 2   .
Gong [18] analyzed the shortcomings of several current score functions and accuracy functions.
To overcome these defects, Gong proposed original score functions and accuracy functions using the idea of probability theory's total probability formula.
  is regarded the score function of the   is expressed as follows: According to Eqs. (15), we notice that In this case, we can't distinguish which IVIFN is greater.
In general, the score function is a measure that characterizes the degree to which a decision object has a certain attribute. The higher the score value, the more the decision object has this characteristic; on the contrary, the less it has this attribute characteristic. And the accuracy function is a measure to evaluate whether an IVIFN accurately describes the attribute characteristics of the decision object. The larger the value of the accuracy function, the more accuracy the IVIFN is described; otherwise, the fuzzier the description is.
The main drawback of the above method is that they only focuses on the information of membership and non-membership of the IVIFN, but lacks consideration or insufficient use of the information about hesitation. In order to overcome the flaws of all the above methods, by comprehensively considering the information of IVIFN's membership, non-membership and hesitation, a more effective score function and accuracy function is proposed to rank IVIFNs as follows:  Table 1, we can find that the score and accuracy functions introduced above will get an unreasonable ranking of IVIFNs in some situations. But based on the score and accuracy function defined in this paper, we can obtain the right sort effectively.

A review of Yue's MADM method
In this section, we analyze the flaws of Yue's MADM method [27]. Let   Step 2. Normalize the decision matrix In general, there are benefit attributes and cost attributes in the multiple attribute decisionmaking problems [28]. Yue [27] introduced the new formulas to normalize each attribute value in i X . The normalized group decision matrix i Y is constructed by Eq. (19) or (20) as follows [27]: For benefit attribute: For cost attribute: According to the above formulas, each attribute value Step 3. Calculate the mean values and standard deviations of endpoints of intervals in i Y In order to aggregate attribute vectors into an IVIFN, Yue introduced the idea of mean and variance in statistics. Yue pointed out that the average value and standard deviation of the left endpoints (or right endpoints) of the interval can reflect the degree of dissatisfaction (or satisfaction) to a certain extent. Therefore, the following formulas are proposed: Step 4. Determine the lower and upper bounds of the dissatisfaction interval and the lower and upper bounds of the satisfactory interval.
Considering that "0.5" is the natural boundary of [0,1], the boundary of satisfaction interval and dissatisfaction interval are as follows Step 5. Perform linear transformation on the endpoints of the dissatisfaction interval and the satisfaction interval.
Reflecting that the larger the element in Correspondingly, making a linear transformation for Step 6. Calculate the induced IVIFN of i j Y , then obtain the collective decision matrix.
To preserve the   , , , , , Step 7. Calculate the overall evaluation According to Definition 2.3, all elements in each row of the matrix are aggregated to the overall IVIFN of the alternative i A .
Step 8. Rank the preference order of the alternatives.
According to Xu's score and accuracy function shown in Definition 2.4 and the ranking method of IVIFVs shown in Definition 2.5, we obtain alternatives' the preference order.
However, in this paper, we point out that Yue's method [27] has the following shortcomings: a. In Step 2 of the method, it may lose information, which may not distinguish the preference order of alternatives in some situations.
b. Some parameters of the alternatives obtained in Step 4 may be in reverse order, which is illogical. A with respect to attributes 1 u by using the hundred-mark system and attributes 2 u by using the sixty-point system and attributes. Therefore, the score values of attributes are shown in Table 2.
Based on Eq. (19) and the decision matrix i X , We can obtain the normalized matrix i Y shown in Table 3.  Table 4 and Table 5. Table 4 The parameters of attributes of alternative 1 To compare the results, as we supposed at the beginning, we set each attribute's weight to be 1 2 =0.5 w w  . Using Eq. (28), we can obtain the collective evaluation of each attribute's results characterized by IVIFNs in Table 6. The score of the overall evaluation and the preference order of each attribute can be calculated by Definition 2.4 and Definition 2.5, which are summarized in Table 7.
Therefore, it is obvious that Yue's method gets an incorrect order of 1 A and 2 A . Yue's method has the flaw that it cannot get the correct preference order of alternatives due to the loss of information in some situations. The reason for this problem is that in the normalization process, Eqs. (19) and (20) only consider the distribution information of the data itself(local) but ignore the overall relationship with the attribute interval(global). Therefore, in the normalization process, we will introduce attribute interval information to overcome this defect.

The counter example of illogical parameters and faulty evaluation results
is the set of the alternatives and   A , with respect to attributes 1 u , 2 u by using the hundred mark system.
Therefore, the scores of attributes are shown in Table 8.
Based on Eq. (19) and the decision matrix i X , We can obtain the normalized matrix shown in Table 9.   To compare the results, as we supposed at the beginning, we set the weight of each attribute to be 1 2 =0.5 w w  . Using Eq. (28), we can obtain the collective evaluation of each attribute's results characterized by IVIFNs in the Table 10. The score of the overall evaluation and the preference order of each attribute can be calculated by Definition 2.4 and Definition 2.5, which are summarized in Table 12. From the data in Table 12, we can detect that Yue's method [27] has the drawbacks that it may get some illogical parameters of alternatives and some of them violate their own definition of IVIFN in this situation(the left endpoint of the interval is greater than the right endpoint, and the interval is less than 0). To overcome the above limitations, considering the characteristics of interval data and reducing the influence of outlier information through weights, a transformation matrix based on principal component analysis, set distribution weights, and a novel mapping strategy are introduced to propose a novel method for aggregating IVIFNs. . The improved MADM method shows as follows: Step 1. Establish the decision matrix (DM) offered by the decision-maker can be described as Eq. (18).
Step 2. To address the loss of information in Yue's method, by extending the normalized method [23] , iu kj il kj Step Step 7. Calculating the group decision with the IVIFNs Furthermore, to transform , , R are the number of elements in the satisfactory set, dissatisfactory set and uncertain set, respectively. Moreover, considering that "0.5" is the natural boundary of [0,1] and also represents the point of maximum hesitation before the linear change. We introduce the proportion of the elements of the hesitation set in all elements to reflect the hesitation information after the linear change.
Step 8. Based on Definition 2.3, calculate the overall evaluation for each alternative.
Step 9. Rank the preference order of the alternatives based on Definition 2.18 and overall evaluation, according to score function and accuracy function calculated by Eqs.  The same assumptions as those in the example of incorrect preference ranking, we apply the proposed method to deal with the first counter example, shown as follows: Step 1: The decision matrix (DM) offered by decision-makers has to be described in Example 3.1, as follows: Step 2: According to Eq. (29), the attribute values in the decision matrix i X can be normalized to matrix i Y , which are shown in Table 14.    Step 7: As example 3.2.1, we let the weight of each attribute to be 1 The collective evaluation of each alternative characterized by IVIFNs in Table 18 is aggregated by Eqs. (36) -(37). Step 8-9: The score of the overall evaluation and the preference order of each attribute are determined by Definition 2.16 and Definition 2.18, which are shown in Table 19. Table 19 The score of the overall evaluation and the preference order of each alternatives i

Comparison with the example of illogical parameters and faulty evaluation results
The same assumptions as those in the example of illogical parameters and faulty evaluation results, we use the proposed MADM method to handle the second counter example, shown as follows: Step 1: The decision matrix (DM) offered by decision-makers has to be described in Example 3.2, as follows: Step 2: According to Eq. (29), the attribute values in the decision matrix i X can be normalized to matrix i Y , which are shown in Table 21. Step 3: Based on Eq. (19) and the normalized matrix i Y , We can construct the transformed matrix i R as Table 22.   Table 23-24.     Step 8-9: The score of the overall evaluation and the preference order of each attribute are determined by Definition 2.16 and Definition 2.18, which are shown in Table 26.  [30]. The air quality data [31] is shown in Table 27 -29.
Since the lower the values of the three indicators, the better the air quality and the range from 0 to 300 is the condition for healthy people to be able to exercise outdoors. According to Step 2, we first normalize data in Table 27 -29 to the corresponding Table 30 -32 by Eq. (30). Then we further transform the normalized matrix into the transformed decision matrix by Step 3, shown in Table   33 -35. Based on Eqs. Many practical problems are often characterized by MADM. In this section, taking thousands of air quality data generated every day as an example, it is very difficult or costly to store and analyze large-scale data directly. However, it becomes easier to save and process air information for a period of time through interval values. In this paper, we have developed a straightforward and practical aggregation technique to convert interval values into an IVIFN and apply the proposed MADM method to air quality assessment.         It is worth pointing out that: (1) the method has no restrictions on the data distribution, the sample size, and the number of attributes; (2) the ranking of the alternatives is straightforward and effective when the overall evaluation is quantified by the aggregated IVIFNs; (3) The richer the data, the more the superiority. Furthermore, the transformation is simple and can be performed easily on a computer. The major contributions of this paper are fourfold: (1) The novel score function and accuracy function for ranking IVIFNs are proposed.
(2) Pointing out the problems of Yue's MADM method through counter-examples.
(3) A new methodology for MADM problems under interval-valued intuitionistic fuzzy situation is established.
(4) An instance analysis is provided to elucidate the reasonableness and efficiency of the proposed method.
The proposed method can be programmed and implemented easily on a computer. In the future, we will study the application of the proposed MADM method to different decision-making problems, such as big data analysis, pattern recognition, expert systems.

Availability of data and material
The data used to support the findings of this study are included within the article.

Conflicts of interest/Competing interests
We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.