An Incremental Algorithm for Concept Lattice Based on SSIM

As an effective tool for data analysis, Formal Concept Analysis (FCA) is widely used in software engineering and machine learning. The construction of concept lattice is a key step of the FCA. How to effectively update the concept lattice is still an open, interesting and important issue. The main aim of this paper is to provide a solution to this problem. So, we propose an incremental algorithm for concept lattice based on image structure similarity (SsimAddExtent). In addition, we perform time complexity analysis and experiments to show effectiveness of algorithm.


Introduction
Formal concept analysis is an effective method of data analysis based on lattice theory framework, it derives from the philosophical description of the concept. Conceptual hierarchies can be extracted from the formal context of a particular domain represented by a binary relationship. Concept lattice is the core data structure of formal concept analysis, it describes the essential relationship between objects and attributes, shows generalization and instantiation between concepts. Hasse diagram is used to visualize concept lattice. The rule acquisition methods which based on concept lattice has been widely applied to many areas, such as data mining [1], software engineering [2], linguistics [3], ontology engineering [4], bioinformatics [5][6]and information retrieval [7] [8]. Poelmans [9] summarizes the wide application of FCA-based methods in different fields. However, existing concept lattice construction methods have many disadvantages, such as redundant rule acquisition, noise sensitivity, time and space complexity. At present, the research about concept lattice mainly includes the construction of concept lattice construction, knowledge discovery based on concept lattice, generalization of concept lattice model, etc. the construction of concept lattice is one of the core tasks of the FCA, but the number of concepts of a formal context increases exponentially with the size of formal context. So far, many researchers have proposed a variety of concept lattice construction algorithms, those algorithms can be roughly divided into two categories: batch and incremental algorithms. Batch algorithm first generates all the concepts, then generates edges based on their direct precursor and successor relationships, and finally completes concept lattice construction. The batch algorithm constructs concept lattice from top to bottom or from bottom to up on given data, the concept lattice needs to be rebuilt when adding new objects. So, there is no flexibility for dense formal context. The incremental algorithm was proposed by Godin [12] in 1995. The key to the efficiency of the incremental algorithm is to generate position of the added object and update the edge of node. Some researchers have made many improvements to the Godin algorithm. XieZhiPeng [13] uses a tree construction to organize conceptual grid nodes, which can greatly reduce the search space for searching father and child nodes of added object. Derrick [14] proposed a concept lattice generation algorithm based on set intersection. Li Yun[15] [16] proposed an attribute-based concept lattice incremental algorithm and multi-concept lattice horizontal merging algorithm, especially illustrating the effect of attributes splitting strategy on concept lattice construction efficiency. Yuan [17] proposed an object-based efficiency improvement mechanism for concept lattice by tracking the maximum concept. These incremental generation algorithms can dynamically update and maintain on the basis of the original concept lattice, which is suitable for recording the background of the incremental transaction database, and the efficiency of the algorithm is improved to varying degrees. So, the algorithm is the main focus of this paper. However, the existing algorithm have universal defects: high computational complexity, high degree of redundancy in extraction rules, and over-fitting of sample data. Many image processing and computer vision applications seek to maximize vision quality and/or minimize perceptual distortion in images, so it is important to develop more accurate computed image quality assessment (IQA) methods to estimate the vision quality of distorted images [18][19][20]. Some objective distortion measurement (e.g. MSE: Mean Square Error) are easy to calculate and have a clear physical meaning, but they are not greatly correlated with the subjective vision quality of the image. To overcome this problem, a structural similarity index (SSIM) was developed, which is an objective full reference image quality assessment index. SSIM [21] is one of the most popular IQA because of its high predictability and wide applicability to image quality optimization problems. SSIM is the product of three integrated measurement: luminance (mean), contrast (standard deviation), and structure (correlation). Therefor it has good performance in optical images [22] [23]. Although SSIM is better, it can not effectively evaluate the image quality of severe distortion, which may be inconsistent with subjective assessment result. To overcome the problem, many researchers have proposed an improved SSIM method. Since humane eye is highly sensitive to image edges, image gradient and edge information can reflect change in image texture features, so Yang established gradient-based structural similarity index (GSSIM) [24] and edge-based structural similarity index (ESSIM) [25] [26] that is an SSIM-based optical image improvement method to introduce image edges into SSIM [27] proposed regression SSIM to measure the similarity of images of two similar modes, it is more effective than SSIM in suppressing noise and outliers in data. Marziliaono calculates the continuous width of the image to measure the degree of image blur [28]. Grete uses different gradient changes in different resolution images to evaluate the ambiguity of color images after Gaussian blur [29]. Yuan proposed gradient information based on luminance components to evaluate the sharpness of image fusion [17]. In this paper, the formal context analysis theory and concept lattice incremental construction algorithm are analyzed. Combined with SSIM, an incremental concept lattice construction algorithm based on SSIM (SsimAddExtent) is proposed. The main ideas are as follows: First, we consider the essence of the concept lattice construction is the knowledge division process. That is, each node on the Hasse diagram is a class with common attributes. Secondly, we consider when a class is mapped to a graphics, the center point of the class is strongly attractive to other points in the class, and other points will be as close as possible to the center point, away from the class boundary. Therefore, the mapping graphics of class will have a stable boundary, even if some elements are added or remove from the class, the boundary of the mapping graphic will be unchanged or only slightly changed. That is to say, the mapping graphics of the classes before and after changing is highly similar. Based on the above considerations, we first propose an image structural similarity index based on Elliptic Fourier Descriptor (EFD-SSIM). Secondly, an improved concept lattice incremental construction algorithm based on EFE-SSIM is proposed (SsimAddExtent). The degree of influence of each newly added object on the boundary contour change of the mapping graphics of the original node on the Hasse diagram is used as the basis for judging the position of the newly added object. Theoretical analysis and experimental comparison demonstrate that SsimAddExtent is superior to other algorithms and construction efficiency is higher. The structure of this paper is as follows. In the second section, we introduce some basic definitions and propositions of FCA. The third section describe our algorithm. The fourth section evaluate the performance of the algorithm. Our work is summarized in the fifth section.

Preliminaries
In this section, we will introduce the basic notions and conventions about FCA and SSIM. All definitions and propositions are assumed they can be found in [30][31] which the reader is kindly referred to for more detailed description.  A formal context can be represented by a cross-table or matrix where every row is an object and every column are an attribute. Cross in table represent incidence relation I. Table 1 show an example of the formal context.  with partial order relation.

Definition4. If any subset of
has supremum and infimum, is a complete set, called concept lattice. Like any partial order set, the concept lattice can be represented by a line graph or an Hasse diagram. In the Hasse diagram, only adjacent nodes can be connected by edges. If AB should be lower than   22 , AB in Hasse diagram. The Hasse diagram of Table 1 (Fig 1). Table 1 2

.2 Structural similarity index (SSIM)
SSIM is proposed as an image quality assessment algorithm, and its correction with human perceived quality is higher than Mean Square Error (MSE) [21]. SSIM includes the luminance information, contrast information and structural information in original image and distortion image in, they are as follows: In this section, we will expatiate Elliptic Fourier Descriptor, and establish an improved SSIM algorithm based on Elliptic Fourier Descriptor (EFD-SSIM). After carefully studying FCA theory and incremental algorithm, we realize the construction process of concept lattice is actually the knowledge classification, each node on the concept lattice is a set where element with common attributes. When a new object is added to a concept lattice, the EFD-SSIM will be used to judge which node has the closest distance from the new object on the upper layer of the concept lattice, thus finding the first position of the new object on the concept lattice. After the first location is done, the final position of the new object and connection relationship with other nodes will be determined along the top-to-bottom direction. Therefore, we establish an incremental algorithm based on EFE-SSIM. The algorithm is based on the graphical representation of knowledge classification, and use EFD-SSIM to calculate the nearest node of the new object on the concept lattice, thus realizing location of the new object.

EFD-SSIM
Fourier Descriptor (FD) is a classic contour-based shape representation that originally proposed by Cosgriff (1960). The main idea is to use a set of data representing the overall frequency of the shape to describe the contour features, and to have invariance to operations such as rotation and translation. It is hot of the shape representation research. In terms of algorithm research, the many researchers have done a lot of work to improve shape representation algorithm based on Fourier operator in order to enhance the ability of shape representation. D Zhang, G Lu proposed an enhanced universal Fourier Descriptor to extract the key content of graph, which resolves the shortcoming that the most of descriptors are not suitable for generic shape representation [32]. SS Li ,YD Huang ,JW Yang proposed a region-based affine invariant ring Fourier Descriptor for affine invariant feature extraction , which can be used to extract contour feature of object with multiple component [33]. R Kasaudhan, SH Son proposed an enhanced version of the grid distance Fourier Descriptor to calculate image similarity and improve the image matching ratio. B Belkhaoui, AToumi combined Fourier Descriptor with watershed algorithm (WS), a process and method for auto target recognition based on inverse synthetic aperture radar image to solve the target recognition of the radar image [34]. The principle and related work of Fourier Descriptor will be described in detail below.
The first, we define a continuous curve s(t) (see Fig 2). Let   00 , xy as starting point of the target boundary. After moving at a certain speed in the counterclockwise direction, the target boundary can be described by the coordinates of the boundary points.

Fig 2 a continuous curve
The boundary curve is defined as: Where, t is the unit arc-length that moves along the boundary curve. To describe the contour of the image, the selected starting point must to move one circle along the boundary. So s(t) is a periodic function, Where, , kk ab is called Fourier Descriptor. According to the orthogonal properties of the trigonometric functions in Equation (5), we can see that: According Equation (4) According to Equation (4) and Equation (5) Then, according Equation (6): Because curve s(t) is non-continuous, we use the most direct Riemann summation method to approximate the integral value to obtain the discrete approximation of Equation (9) We assume that Equation (4) The coefficients in Equation (14) The following conclusion can be proved; In order to measure the degree of changing of the boundary contour of image, the overall structural information of visible region can no longer be extracted, and only the structural information of boundary is utilized, which is represented by EFD. So, the Equation (1) is modified, which include luminance information, distortion information and edge structure information. EFD-SSIM is defined by Equation (18) and (19). EFD-SSIM is closer 1 means that the boundary contour of two images have highly similarity. is standard deviation of edge descriptor of two images.

SsimAddExtent
Definition5. Let L, L be the concept lattice before and after inserting the new object m,   gm  is attributes set of m. (A, B) is a concept in L . Then., (1). According to the above definition, we can find that searching modified concept and new concept are very important to increment algorithm. Modified concept is easy to identify because its attributes set is subset of   gm  .We only need to insert new object m into their scope. On the other hand, new concepts need more time in the process because they do not exist until new attributes are inserted. To decide which a concept need to be created, incremental algorithm usually make use of the relationship between generator and new concept, which is defied by definition5. Each new concept (X, Y) has at least one generator, but there is only one canonical generator in all generators. Because the properties, we can create a new concept by recognizing the corresponding canonical generator. The SsimAddExten algorithm proposed in this paper construct concept lattice recursively from the maximum upper bound of the concept lattice, just like the original AddExtent algorithm [35]. Their difference is SsimAddExtent uses EFD-SSIM function to find the maximum general concept called MaximalConcep in the recursive process. The algorithm will be summarized as follows: Assuming L is a concept lattice of formal context K=(U, M, I), Extent is object sets of any node in L, Intent is attributes set of any node in L, 1 B is the attributes corresponding new object S. To insert S into L, the MaximalConcept of S in L should be decided at first, S will be put in the mapping graphic of the direct child node of the root node, respectively, and uses EFD-SSIM algorithm to decide where S should be inserted into. If 0<EFD-SSIM<0.5, S will be not added as new object to the corresponding node. If 0.5<EFD-SSIM<1, the corresponding node in L will be updated to   , Extent S Intent U , and as the maximum upper bound of new concept and modified concept, namely MaximalConcept. Thereafter, the attributes set of S is sequentially intersected with the direct child node of MaximalConcept. If the intersection of two sets is empty, the direct child is called an invariant node. If the attributes set of the direct child node is a subset of 1 B , the node will be updated to   , Extent S Intent U . If the intersection of two sets is not empty and it different from attributes set of any child node of MaximalConcept, then   1 ,

Extent S Intent B
U I is called new node. Follow the above steps to rescursively search until all child nodes of MaximalConcept are travered. SsimAddExtent is as follows: Input: concept lattice 1 L ; s is new object; g(s) is the attributes set of s Output: concept lattice 2 L (1). Find all of direct child node of the root in 1 L into graphic, (2). If the number of those nodes is n (3). Mapping every node of (2) into graphic (4). Add object s in those nodes, respectively (5). Mapping (4) into graphic (6). For EFD-SSIM (step (2) and step (5)) in n.
If EFD-SSIM of the direct child nodes of the root in 1 L are greater than 0 and less than 0.5, the direct child node is an unchanged node. If EFD-SSIM of i-th direct child node of the root in 1 L is greater than 0.5 and less than 1, the node is an update node and seen as the MaximalConcept, the node is updated to    End if End for (9). Repeat steps (8) to form the complete concept lattice

SsimAddExtent algorithm case analysis
In this section, we will use a case to explain how to SsimAddExtent algorithm reduces runtime. Table 2 shows the formal context before adding the object 15, while Table 3 shows the formal context after adding the object15. Correspondingly, Fig 3 depicts the concept lattice based on Table 2. Table 2 The formal context before adding the object 15  Table 2   Table 3 The formal context after adding the object 15  1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14 13,14, 5 , 10, 3, 4, 5 , 2, 3, 4, 5, 8, 9 , 1, 2 , 3, 6, 8 , 1,10,11,12, 4, 5, 6, 7, 9 , 11, 6 , , 7   In the process of adding object 15 whose attributes set is {a, d, f, r}. Firstly, the spatial distribution of 16 CC before and after adding object 15 is mapped into graphic, in order to realize graphical representation of high dimensional data, the dimension reduction process is necessary. After the dimensionality reduction using the t-SNE method in this paper, the graph mapping is performed using Delaunay Triangulation algorithm (see Fig 4) Table 4 shows EFD-SSIM value of 16 CC before and after adding object 15. Since the value of 3 5 6 ,, C C C are greater than 0.5. Therefore, they are modified nodes, their expressions are as follow:  13 14 , CCare modified concept and 15 16 , CCare general old concept. Similarly, 14 C which is the direct child node of 5 C is modified concept, 10 C is general old concept; 9 13 , CCwhich are the direct child nodes of 6 C are modified concept, the others are general old concept. Modified concepts are as follows: The direct child nodes of 7 C are 17 18 , CC ; the direct child node of 9 C is 17 C ; the direct node child nodes of 10 C are 18 19 , CC; the direct child node of 11 C is 19 C ; the direct child node of 12 13 , CCis 20 C ; the direct child nodes of 14 C are 19 24 ,  16),(a, f)} is new concept which is generalized by 18 C , the others are general old concept. Determining the father-son relationship between new concepts and modified concepts to update to the edges in Hasse diagram (see Fig 6).

Experimental Evaluation and Analysis
In order to prove the efficiency of the algorithm proposed in this paper, we use R language to implement it and original AddExtent algorithm. SsimAddExtent algorithm has no superiority in time complexity, it is   3 O L G M  [36]. However, since EFD-SSIM is introduced and only the child nodes of modified concepts or new concepts are traversed, some unnecessary comparison is avoided. Therefore, the position of new node can be decided quickly on the large-scale concept lattice, the runtime is greatly reduced. The data set used throughout the experiments was randomly generated with different integrity ratio and they are 10%, 30%, and 50%. The data set have 80 objects, the number of attributes is variant, and each attribute may have different number of objects.

Fig 7 Results for random datasets with 10% integrity ratio
We compared the runtime between the SsimAddExtent algorithm and the AddExtent algorithm on a dataset with 10% integrity ratio (see Fig 7). The number of attributes is increased gradually from 200 to 35000, the number of objects is 80. When the number of attributes is smaller, the SsimAddExtent algorithm differs from the AddExtent algorithm slightly. When the number of attributes is 2700, the runtime of SsimAddExtent algorithm to be less than the AddExtent algorithm, and following the, and as the number of attributes increases, the gap of two algorithm becomes larger. Fig 8 shows that the difference the SsimAddExtent algorithm and AddExtent algorithm on dataset with 30 integrity ratio. The number of attributes is increased from 30 to 12000, the number of objects is invariant. Under the same conditions, the runtime gap is bigger compared to Fig 7. The SsimAddExtent algorithm has superiority whether the number of attributes is large or small.

Fig 8 Results for random datasets with 30% integrity ratio
We compared the runtime between the SsimAddExtent algorithm and the AddExtent algorithm on a dataset with 50% integrity ratio (see Fig 9). The number of attributes is increased from 5 to 1200. Since the size of concepts is greatly large and the consumption of memory resource is very fast, we can only test on 1200 attributes. Figure shows that the runtime is rising quickly. The crossing point of two algorithms appears much earlier compared to Fig 7 and Fig 8. Meanwhile, the performance of the SsimAddExtent algorithm superiority to the AddExtent algorithm.

Conclusions
The incremental algorithm by adding objects to concept lattice , which not only can construct concept lattice, but also update concept lattice. In this paper, we introduce an fast and efficient incremental algorithm construct concept lattice. SsimAddExtent algorithm improves AddExtent algorithm, it uses EFD-SSIM to decide the MaximalConcept, and the attributes of new object only compares with the child nodes of MaximalConcept to decide modified concepts and new concepts. The algorithm saves runtime since it reduces the number of iterations. SsimAddExtent algorithm has obvious advantage over AddExtent algorithm at almost every test dataset even when the number of attributes is small and the dataset has low integrity ratio. Meanwhile, the performance gap between two algorithms will increase as the number of attributes increases. Both theoretical analysis and performance test show SsimAddExtent algorithm is better choice than AddExtent algorithm when we apply the FCA method with large scale data or high data integrity ratio.