Measures of uncertainty for a four-hybrid information system and their applications

A four-hybrid information system (4HIS) is an information system where the dataset of object descriptions consists of categorical, boolean, real-valued and missing data or attributes. This paper studies measures of uncertainty for a 4HIS and its application in attribute reduction. The distance function for each type of attribute in a 4HIS is first provided. Then, this distance is used to produce the tolerance relation induced by a given subsystem in a 4HIS. Next, information structure of this subsystem is proposed in terms of a set vector and dependence between information structures is introduced. Moreover, granulation and entropy measures in a 4HIS are investigated on the basis of information structures. In order to verify the feasibility of the proposed measures, effectiveness analysis is performed from a statistical perspective. Finally, an application of the proposed measures for attribute reduction in a 4HIS is given.


Research background
Due to the complex diversity of the objective world, uncertainty exists in real life. Randomness, vagueness and imprecision are the most important concepts for uncertainty which can be appeared everywhere. Uncertainty plays a vital role in practical problems. Measuring uncertainty (UM) is helpful for understanding the nature of various kinds of information and then offer new visual angle for data analysis. UM a significant issue in many research fields, such as machine learning (Xie and Wang 2014), pattern recognition (Gu et al. 2015), medical diagnosis (Hempelmann et al. 2016), data mining (Delgado and Romero 2016)  Granular computing (GrC), proposed by Zadeh (1997Zadeh ( , 1998, is a mode of thinking or method for solving practical problems based on granularity structure. Because GrC reflects the global view and approximate solution ability of human beings when dealing with multilevel and multiperspective problems, GrC has gradually become an important theory for solving uncertain. Information granulation is the basic content of GrC. An object is divided into a series of different information granules under given granulation criteria which is called the process of information granulation. Under dissimilar granulation criteria, the different granularity layers can be obtained, and then multi-granularity grid structure. Granular structure is the collection of information granules. Lin (1998) and Yao (2003) talked about the importance of GrC, it caught people's attention. GrC is a superset that integrates many theoretical methods in artificial intelligence fields such as rough set theory (RST) Pawlak (1982), fuzzy set theory (Zadeh 1965), concept lattice (Ma et al. 2007;Wu et al. 2009) and quotient space theory .
RST is a considerable mathematical tool. Not only does it offers new scientific logic and research methods for information science and cognitive science, but also provides a tool dealing with uncertainty. Its essential idea is to construct a partition of the universe by means of indistinguishable relations, obtain equivalence classes, and then establish an approximate space. Information system (IS) based on RST is also called knowledge representation system (Pawlak 1982). An IS can be represented by a data table. Furthermore, the data table contains rows labeled by objects of interest, columns labeled by attributes, and entries of the table indicating attribute values. There are many applications in RST, for instance, uncertainty modeling (Dubois and Prade 1990), reasoning with uncertainty (Greco et al. 2006), rule extraction (Blaszczynski et al. 2011;Mi et al. 2005), classification and feature selection (Dai et al. 2018;Hu et al. 2010) are associated with IS.
In order to systematically assess uncertainty, the notion of entropy to communicate theory Shannon (Shannon 1948) was introduced to deal with UM. Beaubouef et al. (1998) investigated other methods to rough sets' uncertainty. Miao and Hou (2004) proposed some more effective and significance measure tools, including information, combination and rough entropy. Liang and Qu (2002) introduced a rough metric method for knowledge in an incomplete information system (IIS). Mi et al. (2005) gave some properties of fuzzy approximation operators and a method of uncertainty measurement for generalized fuzzy rough sets.  studied uncertainty measurement for a fuzzy relation IS.  measured uncertainty of a fully fuzzy IS by using Gaussian kernel. Dai and Tian (2013) studied entropy measures and granularity measures for a set-valued IS.  investigated UM for a covering IS.
For GrC in an IS, the information structure is a significant research topic. An equivalence relation is a special kind of similarity between objects from a dataset. Given an IS, each attribute subset determines an equivalence relation. The object set of this IS is divided into disjoint classes by this equiv-alence relation, and these classes are said to be equivalence classes. If two objects belong to the same equivalence class, then we may say that they cannot be distinguished under this equivalence relation. Thus, each equivalence class is seen as an information granule consisting of indistinguishable objects. The family of all these information granules constitutes a vector; this vector is said to be an information structure in the IS induced by this attribute subset. Equally, information structures in an IS are also granular structures in the meaning of GrC. Yu (2018) proposed information structures in an IIS. Zhang et al. (2018) investigated information structures and uncertainty measures in a fully fuzzy IS.  investigated information structures in a covering IS.

Motivation and inspiration
If an IS has many kinds of attributes or data, such as boolean attributes, categorical attributes, real-valued attributes, missing value and so on, then this IS can be called a multiple data IS. Zeng et al. (2015) called such an IS as a hybrid information system (HIS). How to process this kind of hybrid data? Zeng et al. (2015) investigated the measurement problem of mixed data and the incremental updating method when IS changed. Juhola and Laurikkala (2007) introduced two distance measures in the presence of missing values is very useful to study for medical data of mixed-type variables. Han and Kamber (2011) introduced a useful approach to process hybrid data that a database consisting of six data types. Yu (2019) considered information structures and UM of a hybrid information system with images (HISI).
In practical applications, hybrid data exist anywhere. It is very meaningful topic to discuss UM of an IS. The main purpose of this paper is to study UM of a 4HIS.
In recent years, some scholars have discussed topics related to information structures and uncertainty in an IS, such as Aggarwal (2017), Chen et al. (2017), Xie et al. (2019). However, their research lacks numerical experiments and big data analysis support. To make sure our work is more convincing and complete, this paper gives numerical experiments and data analysis.

Discussion and contribution
In this part, we discuss several references for hybrid data, so as to see the contribution or innovation of this paper more clearly.
(1) Zeng et al. (2015) defined a new distance based on the value difference metric and then constructed a novel fuzzy rough set by combining the distance and Gaussian kernel. Considering an IS often vary with time, they analyzed the updating mechanisms for feature selection with the variation of the attribute set. Moreover, they presented fuzzy rough set approaches for incremental feature selection on HIS and proposed two corresponding algorithms. Finally, extensive experiments on eight datasets show that the incremental approaches significantly outperform non-incremental approaches with feature selection in the computational time.
(2) Zeng et al. (2017) analyzed the changing mechanisms of the attribute values and fuzzy equivalence relations in fuzzy rough set and then presented fuzzy rough set approaches for incrementally updating approximations in an HIS. Moreover, they gave two corresponding incremental algorithms. Finally, extensive experiments on eight data sets show that incremental approaches can effectively improve the performance of updating approximations and not only significantly shorten the computational time, but also increase approximation classification accuracies.
(3) Yu (2019) considered a hybrid information system with images (HISI). First, he developed new hybrid distance in an HISI. Then, he obtained the fuzzy T c os-equivalence relation by using Gaussian kernel. Next, he described information structures in an HISI by set vectors and studied dependence between them by using inclusion degree. Finally, he investigated UM for an HISI by means of its information structures. (4) Yuan et al. [38] introduced fuzzy rough sets to deal with the problem of outlier detection in hybrid data (numerical, categorical). First, they defined the granule outlier degree to characterize the outlier degree of fuzzy rough granules by employing the fuzzy approximation accuracy. Then, they constructed the outlier factor based on fuzzy rough granules by integrating the outlier degree and the corresponding weights to characterize the outlier degree of objects. Furthermore, they designed the corresponding outlier detection algorithm. Finally, they evaluated the effectiveness of the algorithm through experiments on 16 real-world datasets. The experimental results show that the algorithm is more flexible for detecting outliers and is suitable for hybrid data. (5) Zhang et al. (2016) proposed a fuzzy rough set based information entropy for feature selection for hybrid data (nominal, real-valued). They first proved that the newly defined entropy meets the common requirement of monotonicity and can equivalently characterize the existing feature selection in the fuzzy rough set theory. Then, they formulated a feature selection algorithm based on the proposed entropy and a filter-wrapper method is suggested to select the best feature subset in terms of classification accuracy. Finally, they carried out an extensive numerical experiment to assess the performance of the feature selection algorithm. (6) This paper deals with hybrid data (categorical, boolean, real-valued and missing data). The main details are based on the following considerations: a) a 4HIS itself has uncertainty; b) how to define a tolerance relation in a 4HIS; c) information structure is very helpful for knowledge discovery from a 4HIS; d) the magnitude of the measured value in a 4HIS can be compared by dependence between information structures; e) which measure is chosen to measure the uncertainty of a 4HIS; f ) it is very necessary to analyze the effectiveness of the proposed measurement; h) it is important to give an application of the proposed measures for attribute reduction in a 4HIS.
This paper first provides the distance function for each type of attribute in a 4HIS. This distance is used to produce the tolerance relation induced by a given subsystem. Then, information granules of a 4HIS based on the tolerance relation are constructed. By the way, the information Fig. 1 The work process of the paper structure formed by information granules composed of toleration classes is presented. Next, the dependence between them is discussed. By means of the dependence, four kinds of measurement to estimate the uncertainty of a 4HIS are put forward. Moreover, the effectiveness analysis about the proposed measures is carried out from a statistical perspective. We find the influence of θ value on the UM for a 4HIS, which may have potential application value in data mining. Finally, an application of the proposed measures for attribute reduction in a 4HIS is given.

Structure and organization
The work process of the paper is given in Figure 1.
The remaining part of this paper is organized as follows. Section 2 recalls some notions about a 4HIS. Section 3 constructs the distance between the information values of two objects about each type of attribute in a 4HIS and proposes the tolerance relation induced by a given subsystem of a 4HIS. Section 4 describes information structures in a 4HIS and studies the dependence between them. Section 5 introduces some tools for measuring uncertainty of a 4HIS. Section 6 conducts effectiveness analysis for showing the feasibility of these tools. Section 7 gives an application of the proposed measures for attribute reduction in a 4HIS. Section 8 concludes this paper.

Preliminaries
In this section, some basic concepts about a 4HIS are introduced.
Definition 2.1 ( Pawlak 1991) Suppose that X is a finite set of objects. Assume that AT expresses a finite set of attributes.
Then the ordered triple (X , AT ) is referred to as an information system (IS), if every attribute a ∈ AT is able to decide a function a : Let (X , AT ) be an IS. If there is a ∈ AT such that * ∈ Y a , here * means a null or unknown value, then (X , AT ) is called an incomplete information system (IIS).
where A cat , A boo and A rea are the categorical, boolean and real-valued attribute set, respectively. Table 1, categorical attribute "Headache", boolean attribute "Muscle pain", real-valued attribute "Temperature" and categorical attribute "Symptom" are denoted as a 1 , a 2 , a 3 and a 4 , respectively, and " * " indicates the missing value. Then, Table 1

Example 2.3 In
Y * a is denoted as the set of non-missing information values of the attribute a. Then

Tolerance relations in a 4HIS
In this section, the distance between the information values of two objects about each type of attribute in a 4HIS is first constructed. Then, the tolerance relation induced by a given subsystem of a 4HIS is proposed.

The distance function for each type of attribute in a 4HIS
For missing data, we have the following thoughts.
1) Consider "x = y, a(x) = * , a(y) = * , a ∈ A , because "a(x)" is treated as "do not care" condition, thus a(x) has the probability of 1 |Y * a | to equal to one certain value of Y * a . 2) Consider "x = y, a(x) = * , a(y) = * , a ∈ A , because "a(y)" is treated as "do not care" condition, thus a(y) has the probability of 1 and a(y) both have the probability of 1 |Y * a | to equal to one certain value of Y * a , so the joint probability of a(x) and a(y) is 1 For "x = y, a(x) = * , a(y) = * , a(x) = a(y), a ∈ A boo , according to the opinion of Zeng et al. (2015), define dis(a(x), a(y)) = 1.
For "x = y, a(x) = * , a(y) = * , a(x) = a(y), a ∈ A cat , according to the opinion of Yu (2019) In this way, the following definition is proposed.
and a(y) is defined as

Example 3.2 (Continued from Examples 2.3)
(1) Since a 1 is a categorical attribute, a 1 ( By Definition 3.1, (3) Since a 3 is a real-valued attribute, by Definition 3.1, we have where every element of D is be viewed as a categorical attribute.
Then M a is referred to as the distance matrix of the attribute a in (X , AT ).

The tolerance relation induced by a given subsystem of a 4HIS
Below, the tolerance relation induced by a given subsystem of a 4HIS is established.
Then R θ A is referred as to the relation induced by the subsystem (X , A) with respect to θ .
Then R θ A (x) is referred as to the tolerance class of the object x under the tolerance relation R θ A . Proposition 3.6 Let (X , AT ) be a 4HIS. Then the following properties hold: (1) If A ⊆ B ⊆ AT , then for any θ ∈ [0, 1] and x ∈ X, (2) If 0 ≤ θ 1 ≤ θ 2 ≤ 1, then for any A ⊆ AT and x ∈ X, In what follows, an algorithm of computing R θ A is designed as follows.

Information structures in a 4HIS
In this section, information structures in a 4HIS are studied.

The concept of information structures in a 4HIS
Definition 4.1 Suppose that (X , AT ) is a 4HIS and A ⊆ AT . Pick θ ∈ [0, 1]. Put Then I nS θ (A) is referred to as θ -information structure of the subsystem (X , A).
(1) If for any i, is referred to as depend on I nS θ 1 (A). It is written as I nS θ 1 (A) I nS θ 2 (B).

UM of a 4HIS
In this section, we studies UM of a 4HIS.
This theorem shows that when the available information becomes coarse, the θ -information granulation increases, and when the available information becomes finer, the θ -information granulation decreases. In other words, the greater the uncertainty of the existing information, the greater the value of the θ -information granulation. Therefore, we can draw the conclusion that the θ -information granulation introduced in definition 5.2 can be used to evaluate the degree of a 4HIS.
Proposition 5.5 Let (X , AT ) be a 4HIS. Then the following properties hold: Proof These follow from Theorems 4.8 and 5.4(1).
(2) By Definition 5.6, This theorem shows that when the structure of hybrid information becomes finer, the θ -information amount increases, and when the hybrid information structure becomes rough, the θ -information amount decreases.
Proposition 5.8 Let (X , AT ) be a 4HIS. Then the following properties hold: Proof These follow from Theorems 4.8 and 5.7(1).
(2) By Definition 5.11, This theorem shows that the greater the uncertainty of the available information, the greater the θ -rough entropy. Therefore, we can draw the conclusion that the θ -rough entropy proposed in Definition 5.11 can be used to evaluate the degree of determination of a 4HIS.
Proposition 5.14 Let (X , AT ) be a 4HIS. Then the following properties hold: Proof It can be proved by Theorems 4.8 and 5.13(1).
(2) By Definition 5.15, It should be noted that This theorem shows that when the structure of hybrid information becomes finer, the θ -information amount increases, and when the hybrid information structure becomes rough, the θ -information amount decreases.
Proposition 5.17 Suppose that (X , AT ) is a 4HIS. The the following properties hold: Proof It follows from Theorems 4.8 and 5.16(1).

Experiments and analysis
In this section, we design a numerical experiment and do effectiveness analysis to evaluate the proposed measures.

A numerical experiment
In order to show the performance of the proposed measures for the uncertainty in a 4HIS, we select nine data sets that come from UCI (Repository of machine learning databases) which is described in Table2, where each data set can be expressed as a 4HIS. We carry out a numerical experiment on the nine data sets.
(2) When the attribute subset A is given, G θ and E θ r are both monotonically increasing as the threshold θ increases. Meanwhile, E θ and H θ are both monotonically decreasing with the threshold θ growth. It shows that the uncertainty of a 4HIS increases as the threshold θ increases.
Thus, G θ , E θ , E θ r and H θ can be applied to measuring uncertainty of a 4HIS.

Dispersion analysis
In this part, the standard deviation is applied to do effectiveness analysis of the proposed measures.
Let X = {x 1 , · · · , x n } be a data set. Then arithmetic average value, standard deviation and standard deviation coefficient of X are defined as follows: x .
Then, according to the above experiments, the C V values of measure sets on each data set are computed (see Figs. 10,11,12,13,14,15,16,17,and 18). From Figs. 10,11,12,13,14,15,16,17,and 18, the following conclusions are obtained: (1) When the threshold θ is changing, X θ j E is minimum in four measure sets in each data set except for X θ 6 E (Ec), X θ 7 E (Ec), X θ 8 E (Ec) and X θ 9 E (Ec). It shows that the dispersion degree of E θ is minimum in most cases.
(2) When the attribute subset A i is changing, X A i E is minimum in four measure sets in each data set except for He). It shows that the dispersion degree of E θ is minimum in most cases. Therefore, E θ has much better performance to measure 4HISs' uncertainty.
By summarizing the above experiments, the following results are obtained: (1) If people need only monotonicity, then G θ , E θ , E θ r and H θ can be used to measure the uncertainty of a 4HIS; (2) If people investigate only dispersion degree, then E θ has better performance to measure the uncertainty of a 4HIS; (3) If people consider both monotonicity and dispersion degree, then E θ has much better performance to measure the uncertainty of a 4HIS.

An application
In this part, an application of the proposed measures for attribute reduction in a 4HIS is given.  In this paper, the family of all θ -coordination subsets (resp., all θ -reducts) of AT is denoted by co θ (AT ) (resp., red θ (AT )).
A ∈ co θ (AT ). Proof It can be proved by Theorem 7.5.

Conclusions
A HIS contains many types of attributes. It is more difficult to measure an HIS than an IS with a type of attribute. This paper has measured the uncertainty of a 4HIS that contains four types of data or attributes and given its application in attribute reduction. First, a novel distance function for each type of attribute in a 4HIS has been proposed. The proposed distance is more consistent with reality in measuring the difference between two information values on each type of attribute. And then, the tolerance relation has been produced by using the proposed distance. By the way, the information granules composed of the tolerance classes have been constructed, and the information structure formed by the information granules has been presented. Next, four UMs based on the structures have been investigated. Furthermore, the effectiveness of four measures has been verified by statistical analysis. AS an application of the proposed UMs, attribute reduction has been studied. We have found the influ-ence of θ value on the UM for a 4HIS, which may have potential application value in data mining. This paper provides a new idea of UM for hybrid data. The disadvantage is that attribute reduction algorithms are not given. In the future, we will continue to explore attribute reduction algorithms in a 4HIS based on its UM.