Implementation
Biclique consists of four R functions. The core function, bi.clique, invokes an efficient algorithm to enumerate maximal bicliques. Three utility functions, bi.format, bi.print, and bi.degree, provide formatting and output support.
The bi.clique function takes five arguments, four of which have default values. These five are: an input file name, an input file format (either an edge list (the default) or a binary matrix), two arguments, one for each partite set, that specify the minimum number of vertices required for a maximal biclique to be reported (the default is 3), and an argument specifying the algorithm to use, either MBEA or iMBEA (the default is iMBEA). Pseudocode for MBEA and iMBEA is shown in Algorithm 1. Because iMBEA differs from MBEA by only a handful of additional steps, the two algorithms are presented jointly, with starred lines denoting the steps unique to iMBEA. On dense graphs, iMBEA will usually be the faster algorithm, while on sparse graphs, both algorithms are apt to take about the same amount of time. We therefore recommend the use of iMBEA in most cases. See [13] for a thorough discussion of the two methods.
The three utility functions operate as follows. The bi.print function generates a visual histogram of the distribution of sizes of the maximal bicliques enumerated by the most recent call to bi.clique. The bi.format function augments a list of edges with a header line declaring the number of vertices and edges the list contains, as is required by bi.clique. The bi.degree function reads a bipartite graph and outputs the degree of each vertex.
Application
Biclique is invoked in R as follows:
bicliques = bi.clique(filename, left_least, right_least, version, filetype)
This function generates a list of bicliques, which in the above example are assigned to the bicliques variable. The filename argument is the name of the input file. Using “left” to denote the first partite set and “right” to denote the second, the left_least and right_least arguments specify the minimum number of vertices required from each respective partite set in order for a maximal biclique to be reported. The version argument specifies whether to use MBEA or iMBEA.
The filetype argument can be a little more complicated. It specifies the input file format, which must be either an edge list (0) or a binary matrix (1). The default value is edge list. Such a list is tab-separated, with the first line declaring the number of vertices in each partite set, followed by the number of edges in the graph. Each subsequent line contains a pair of text labels for an edge, with the edge’s left endpoint listed first and its right endpoint second. The binary matrix format is also tab-separated. Example input files are provided with the package.
A sample bipartite graph is depicted in Figure 1, where vertices u1, u2, u3, u4 and u5 are in the left partite set, while v1, v2, v3 and v4 are in the right. This graph is encoded as graph.el, shown in Table 1.
5 4 10
u1 v1
u1 v2
u1 v4
u2 v1
u2 v2
u2 v4
u3 v3
u3 v4
u4 v4
u5 v4
|
Table 1. The encoding of graph.el, stored in edge list format.
The use of bi.clique is exemplified in Sample Invocation 1, where graph.el denotes the sample graph just illustrated and encoded. Since neither left_least nor right_least is specified, all maximal bicliques with at least one edge will be reported. Similarly, since no version argument is declared, iMBEA will be invoked by default. And since no filetype argument is provided, graph.el is assumed to be in edge list format. Summary information returned by bi.clique comprises a listing of the input’s biclique distribution, its total number of bicliques, and its vertex- and edge-maximum biclique sizes.
> bicliques = bi.clique("graph.el", , , ,)
Biclique Number
K5,1 1
K1,2 1
K2,3 1
Number of bicliques : 3
Vertex-maximum biclique : K5,1
Edge-maximum biclique : K2,3
|
Sample Invocation 1. A basic call to bi.clique.
Biclique is available on CRAN at https://cran.r-project.org/web/packages/biclique/index.html. Included is an R-style reference manual with detailed descriptions of all arguments and options. This stable, CRAN-ready version can be installed in R with the command install.packages("biclique"). The latest version of Biclique can be obtained via devtools::install_github("YupingLu/biclique"). Questions or bugs can be submitted to the GitHub webpage. Included in the package are several example bipartite graphs, most of which we obtained from the Koblenz Network Connection [15].
Tests
All tests were conducted on a Dell server with an Intel Xeon E3-1220 v5 3.0GHz processor under the Red Hat Enterprise Linux 7 operating system, with 16GB DDR4 SDRAM, using. R 3.4.2. C code compiled with gcc 4.8.5. Eight bipartite graphs obtained from [15] were studied. As shown in Table 2, timings on them ranged from 0.005 seconds to 21.094 seconds. These tests were not meant to be comprehensive, but instead merely to demonstrate that this software can handle affiliation graphs, authorship graphs, interaction graphs and others in addition to the various biological and random graphs tested in [13].
Graph
|
Left
|
Right
|
Edges
|
Bicliques
|
Vertex-Max
|
Edge-Max
|
Timings
|
S. African Companies
|
6
|
5
|
13
|
8
|
K4,1
|
K3,2
|
0.005
|
Southern Women 2
|
5
|
5
|
14
|
12
|
K3,1
|
K2,2
|
0.005
|
Southern Women 1
|
18
|
14
|
89
|
63
|
K14,1
|
K5,4
|
0.007
|
Club Membership
|
25
|
15
|
95
|
60
|
K21,1
|
K11,2
|
0.006
|
Corporate Leadership
|
20
|
24
|
99
|
66
|
K12,1
|
K9,2
|
0.007
|
American Revolution
|
136
|
5
|
160
|
14
|
K59,1
|
K59,1
|
0.006
|
Crime
|
829
|
551
|
1476
|
620
|
K1,25
|
K1,25
|
0.035
|
arXiv cond-mat
|
16726
|
22015
|
58595
|
21905
|
K1,116
|
K1,116
|
21.094
|
Table 2. Timings on eight sample bipartite graphs.