An Algorithm proposed which is called Sorted Frequent Pattern Growth Tree (SFP-Growth Tree) based on the basic FP-Tree theory designed to challenge a couple of issues. Firstly, in the basic FP-Tree, after the completion of operation, the resulted tree would have not been sorted and finding most frequent patterns requires the tree being sorted or spend the cost of search every time which would not be advantageous. To solve this challenge proposed algorithm creates a tree sorted from the initial levels to the end nodes. Secondly, users may do not want less-frequented patterns while all the transactions have to be inserted in the basic FP-Growth Tree. Due to being initially sorted tree by the proposed algorithm, the undesirable less-frequented patterns could be omitted and the result contain a pruned tree with the most frequented patterns according to the user’s setting. In this method, a node with its metadata is inserted in the tree once and it does not need to be updated until the end of operation. Achieving this aim, reading transactions must be reformed so that the tree grows level by level horizontally. Thus, instead of traversing the tree unregularly inserting nodes, level a must be filled then level a+1 be processed. The figure 1 illustrates nodes inserting sequence direction.
In Table.1, a transaction database is considered as an example that the data would be as the Frequent Items after the first scan[1] (preparation stage, sorting and applying min_support).
Table 1
a transaction database as running example
TID
|
Items in Transaction
|
(ordered) Frequent Items
|
100
|
f, a, c, d, g, i, m, p
|
f, c, a, m, p
|
200
|
a, b, c, f, l, m, o,
|
f, c, a, b, m
|
300
|
b, f, h, j, o
|
f, b
|
400
|
b, c, k, s, p
|
c, b, p
|
500
|
a, f, c, e, l, p, m, n
|
f, c, a, m, p
|
Now, by the proposed algorithm the data should be read according to frequency. As the first entity of transactions are the most frequented item in that transaction construct level 1 of the tree. Then nth entity of each transaction constructs level n. In the example, the data f, c with the frequency of 4, 1 would be positioned at level 1.
It is proposed that in each step of inserting nodes, insert them sorted. Next step, the transaction with f prefix would be read, the frequency of each node calculated and finally all the children created below the parent node at once sorted.
The previous operation repeats for every node on the same level.
This process repeats for every prefix of the tree until whole tree will be constructed sorted. The result SFP-Growth tree for the example would be as figure 4.
Comparing the resulted tree to the tree of basic algorithm, the branches, paths and patterns are similar, however, the SFP-Growth Tree is sorted.