FCHM-stream: fast closed high utility itemsets mining over data streams

The high-speed, continuous and endless characteristics of data streams make it a challenging task to quickly mine high utility itemsets in limited memory space. The sliding window model, which focuses only on the most recent data, has received extensive research and attention as it can effectively adapt to the data stream environment. However, the presence of many communal batches in adjacent sliding windows causes the algorithm to repeatedly generate a large number of identical itemsets, which reduces the spatiotemporal performance of the algorithm. In order to solve these problems and provide users with a concise and lossless resultset, a new closed high utility pattern mining algorithm over data stream is proposed, named FCHM-Stream. A new utility list structure based on batch division and a resultset maintenance strategy based on skip-list structure are designed to effectively reduce identical itemsets repeatedly generated and thus reduce the running time of the algorithm. Extensive experimental results show that the proposed algorithm has a large improvement in runtime compared to the state-of-the-art algorithms.


Introduction
High utility pattern mining (HUIM) is an important research topic in the field of knowledge discovery and data mining. High utility pattern mining (HUIM) solves the problem that traditional frequent pattern mining (FIM) only considers the frequency of patterns and cannot find patterns with higher profits by comprehensively considering the internal utility (purchase quantity) and external utility (profit) of each item [1]. Moreover, the generalized notion of utility can provide decision-makers with more flexibility [2]. The internal and external utility concepts of itemsets are not limited to quantity and profit, and users can give it a more flexible meaning according to the application domain. For example, in text mining, the term B Meng Han 2003051@nmu.edu.cn 1 School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, People's Republic of China frequency (TF) and the inverse text frequency (IDF) can be used as the internal utility and external utility, respectively. Their product TF-IDF is used as the utility to mine a set of meaningful words in the text [3]. There is extensive literature on high utility itemsets mining algorithms, which can be mainly categorized as candidate generation and test [4], pattern growth based [5], utility list-based [2,[6][7][8][9][10], and projection-based methods [1,11].
In recent years, high utility pattern mining over data streams has gradually attracted the attention of researchers. Many application scenarios in real life generate data streams in realtime, such as clickstreams of websites, transactions in retail stores and sensor data from IoT devices. Data streams are high-speed, continuous, and endless sequences of transactions, so it is impossible to store the entire data in memory. Moreover, it is generally believed that old information may be unimportant at the moment [12], and recent data have more impact than the previous old data [13]. The sliding window model is widely used for stream mining because of its emphasis on recent data and its limited memory resource requirements. For this reason, many algorithms [3,[12][13][14][15][16][17] process data streams under the sliding window model. With fixed batches in the window, the sliding window model only stores the recent data in the data streams [14]. As a new batch arrives, the sliding window removes the oldest batch and inserts the new batch. Therefore, the size of communal batches between two adjacent windows will be the size of window minus one. And these communal batches will generate a large number of identical itemsets, and the algorithm repeatedly generates these itemsets, which consumes a lot of time.
Moreover, generating a large number of itemsets is an inherent problem of high utility pattern mining, which consumes a lot of time and memory [18]. And for the user, analyzing a huge resultset also consumes a lot of time. For this reason, Tseng et al. first proposed closed high utility itemsets (CHUIs), which are a compact and lossless representation of HUIs [18]. Currently, many algorithms have been proposed to discover CHUIs in transaction databases, such as CHUD [19], CHUI-Miner [18], EFIM-Closed [20], CLS-Miner [21], and Hminer-Closed [22]. However, in the dynamic data environment, there are few algorithms for mining CHUIs. To our knowledge, only the IncCHUI [23] algorithm and the CHUI-DS [16] algorithm mine CHUIs in incremental databases and data streams, respectively. The CHUI-DS algorithm is inefficient because it needs to consider all items in the sliding window during the search process and does not use an effective pruning strategy and needs to repeatedly generate the resultset.
To solve these problems, we propose the FCHM-Stream algorithm using the sliding window model, which uses the resultset maintenance strategy based on the skip-list structure to quickly and efficiently search closed high utility itemsets over data streams. The contributions of this paper are as follows: (1) A new utility list structure CN-List based on batch division is proposed to divide the batch data in the sliding window into two parts and consider only items in the new batch during the search process, thus effectively reducing the search space and improving time efficiency.
(2) A resultset maintenance strategy (abbreviated as RMS) based on the skip-list structure is proposed to solve the problem of repeatedly generating the same resultset in the sliding window model for the first time as well as reducing the execution time of the algorithm.
(3) A new closed high utility pattern mining algorithm over the data streams called FCHM_Stream is proposed based on CN-list structure and RMS strategy, which can effectively mine the set of closed high utility itemsets over the data stream. And Extensive experiments on six datasets show that the proposed algorithm can effectively improve the time efficiency of the algorithm by using the CN-list as well as the RMS strategy.
The rest of this paper is organized as follows. Section 2 introduces the related work of high utility pattern mining algorithm, sliding window-based high utility itemsets mining over data stream, and closed high utility pattern mining algorithm. Section 3 presents the preliminaries. Section 4 describes the CN-List structure, RMS strategy, and FCHM-Stream algorithm in detail. Section 5 discusses and analyzes the experimental results. Section 6 summarizes the work of this paper and proposes future research directions.

Preliminaries
At first, we present the basic definitions of closed high utility pattern mining for the data streams.
Let I = {I 1 , I 2 , ..., I n } be a set of items. Each item I i is associated with a positive number P(I i ) which represents the relative importance of the item, called the external utility of I i . A transaction T c ⊆ I , where c is the unique identifier for the transaction T c , called the T id for T c . Each item I i in the transaction is associated with a positive number q(I i , T c ) called the internal utility (quantity) of the item I i in the transaction T c . Let DS = {T 1 , T 2 , ..., T m } is a data stream containing a sequence of transactions. An itemset X ⊆ I a subset of I.

Definition 1 (sliding window model).
A sliding window model divides data stream DS into multiple bathes and only use a limited number of batch to maintain the most recent data [17,24]. In the data stream mining algorithm based on sliding window model, each window W k = {B i+1 , B i+2 , ..., B i+v } is composed of a fixed number of batches, and the window size of W k is v. Each batch B l = {T j+1 , T j+2 , ..., T j+z } contains a fixed number of transactions and has a batch size of z. Figure 1 shows an example of a data stream and a sliding window model on it, and Table 1 presents the external utility of all items in the data stream. Each row in Fig. 1 represents  Definition 2 (Utility of an item). The utility of an item I i in a transaction T c is denoted as u(I i , T c ), and is defined as the product of the item's internal and external utility, i.e.,: In a batch B j , the utility of item I i is defined as: In the window W k , the utility of the item I i is defined as: Definition 3 (Utility of an itemset). The utility of a k-itemset X = {I 1 , I 2 , ..., I k } in a transaction T c is denoted as u(X , T c ) and is defined as

Definition 4 (Transaction utility). The utility of a transaction T c is denoted as T U (T c ) and is defined as
Definition 5 (Support and TidSet) [21]. TidSet of itemset X is defined as the set of transaction identifiers for the transactions that contain the itemset X , denoted as T id Set(X ). The support of an itemset X is denoted as sup(X ) and is defined as sup(X ) = |T id Set(X )|.
Definition 6 (Closed itemset) [23]. An itemset X is a closed itemset in the window W k if there exists no proper itemset Y ⊃ X and sup(Y ) = sup(X ).
Definition 7 (High utility itemsets) [6]. In a window W k , if the utility of an itemset X is greater than the user defined minimum utility threshold minutil, i.e., u(X , W k ) ≥ minutil, then X is a high utility itemset.

Definition 8 (Identical itemsets).
An itemset X and an itemset Y are identical itemsets if the itemset X and itemset Y are composed of the same items, and the support and utility values of the two itemsets are the same. For example, the support and the utility of itemset bdf in W 1 and W 2 are the same, so the itemset bdf in W 1 and the itemset bdf in W 2 are the identical itemsets.
Definition 9 (Closed high utility itemset (CHUI)) [19]. An itemset X is a closed high utility itemset if it is a closed itemset in the window W k , and satisfies u(X , W k ) ≥ minutil.
Definition 10 (Transaction weighted utility (TWU)) [6]. The T WU of itemset X is the sum of the utilities of all transactions in the current window that contain the itemset X , denoted as T WU W k (X ), and defined as X ⊆T c ∧T c ∈W k T U (T c ).For example, in window 1, the T WU of item a is: Property 1 Transaction weighted utilization downward closure property. [18] For an itemset X in window W k , if T WU w k (X ) < minutil, then the itemset X and all its supersets are low utility.
Definition 11 (Utility list structure (utility list)) [18]. Let be the total order of I . The utility list of an itemset X is denoted as ul(X ). It uses a tuple < tid, iutil, rutil > to represent utility information for each transaction containing X where tid represents the transaction identifier of the transaction containing X , iutil represents the utility of X in the transaction, and rutil represents the remaining utility of X in the transaction.

Property 2
Remaining utility pruning strategy [18]. Let be an itemset X . Let the itemset Y is the extension of X that can be obtained by appending an item y to X such that y i,∀i ∈ X . If the sum of iutil and rutil values in the utility list of itemset X is less than the minimum utility threshold minutil, then X and its extensions are all low-utility itemsets.

High utility itemsets mining
High utility pattern mining methods can be roughly categorized as candidate generation and test-based, pattern growth-based, utility-list-based, and projection-based [2]. Liu et al. proposed a Two-Phases [4] algorithm to discover HUIs. First, it utilizes transaction-weighted utility to restrict the search space. Then, an extra database scan filters the low-utility itemsets. Since the algorithm uses the candidate generation and test method, too many candidates are generated and the database needs to be scanned many times. Hence, its runtime performance and scalability are inefficient.
To solve the limitation of the method based on candidate generation and test, Tseng et al. proposed the UP-Growth [5] algorithm based on pattern growth. The algorithm devised a compressed tree structure UP-tree, and HUIs can be generated from UP-tree by only scanning the database twice.
Liu et al. introduced HUI-miner [6] algorithm with a utility list structure. The algorithm builds the utility list of each single item to maintain the utility information of the database and needs to perform a large number of join operations, which makes it inefficient. The HUP-Miner [7] algorithm proposed two new pruning strategies: PU-prune and LA-prune to prune the search space, which further improves the performance of the HUI-Miner algorithm. The FHM [8] algorithm proposed a novel pruning strategy named the EUCP strategy, which can reduce the number of join operations by considering the TWU utility of co-occurring items. This strategy has a better pruning effect on sparse datasets. The ULB-Miner [9] algorithm designs a utility list buffer structure to efficiently store and retrieve utility lists. It proposed an efficient method to construct utility lists, thus reducing the time as well as memory consumption in the mining process. The UBP-Miner [10] algorithm proposed a Utility Bit Partition(UBP) structure based on the utility list buffer structure that can effectively improve the efficiency of traversing elements in the process of itemset extension, and proposed a bitwise operation BEO o improve the efficiency of utility list construction and reduce the time complexity of the algorithm. The Hminer [2] algorithm proposed a compact utility list structure named CUL to compress the utility information of itemsets, introduces the concepts of closed and non-closed utility of itemsets, and uses LA-prune and C-prune pruning strategies. And it has better performance on dense and sparse datasets.
EFIM [1] introduced two upper bounds of subtree utility and local utility to prune the search space. The algorithm applies database projection technology and transaction merge technology to reduce the cost of database scanning. The MAHI [11] algorithm uses a projection technique to extract the high utility itemsets in the transaction database using the utility matrix and proposes a matrix-assisted pruning method, MA-prune, that significantly reduces the search space. In terms of runtime and memory usage, EFIM and Hminer algorithms have been proven to be the most efficient algorithms among high-utility pattern mining algorithms.
However, the algorithms introduced above can only extract high utility patterns only for static databases. To solve the problem that existing static methods cannot handle dynamic data in incremental environments, researchers proposed incremental pattern mining and stream pattern mining [25].
To process incremental data, Fournier-Viger et al. proposed EIHI [26] algorithm, which is based on utility-list structure and a HUI-trie structure to effectively mine HUIs in incremental environments. The disadvantage of this algorithm is that it requires two database scans to complete the ascending order of TWU. Yun et al. proposed an incremental HUIs mining algorithm LIHUP [27] based on global list. The advantage of this algorithm is that it only needs to scan the database once. Lee et al. proposed PIHUP [25] algorithm to mine HUIs in incremental database and it uses a novel tree-based structure with the pre-large concept. Yun et al. proposed the IIHUM [28] algorithm to mine incremental HUIs. It designed a global list structure, the Indexed Utility List (IIU-List), and designed a reconstruction technology to process incremental data more effectively.
Data stream processing methods can basically include three models: landmark model, damped window model, and sliding window model [17]. The sliding window model has received extensive research [3,[12][13][14][15]17] because it only focuses on the latest data and uses limited memory space. These algorithms can be mainly divided into tree-based, utility-listbased, and other methods.
Ahmed et al. proposed HUPMS [12] algorithm with a HUS-tree structure to search HUIs over data streams. It is based on the pattern growth mining method while generating a large number of candidates in the mining process. Ryang et al. proposed SHU-growth [14] algorithm with a SHU-Tree structure to discover HUIs over the data stream. It is also based on pattern growth and utilizes two overestimation utility decreasing techniques, RGE and RLE. In terms of runtime and the number of candidates, SHU-growth has better mining performance than HUPMS. Kim et al. proposed a tree-based algorithm named DSHUP [29], which uses the sliding window model to process the latest data in the data stream while using the damped window model to consider the temporal factor of batch data in the window.
Dawar et al. proposed Vert_top-k_DS [3] algorithm, which is a one-phase top-k high utility itemsets mining algorithm over data streams that do not generate candidates. It designs an iList structure that can efficiently update utility information of itemsets. Beak et al. proposed RHUPS [13] algorithm for mining recent high utility itemsets and using a global list-based data structure RHU-list to store and remove data more efficiently. Lee et al. proposed a SHAUPM [30] algorithm based on sliding window model to mine high average utility itemsets and designed a new list structure called SHAUP-List.
Jaysawal et al. proposed SOHUPDS [15] algorithm using a projection-based strategy to mine HUIs, designing an IUDataListSW structure to store the utility, upper-bound, and the location of the item in the transaction. It devises an update strategy to utilize HUIs mined from the previous window to update HUIs in the current window based on the trie-tree structure. Chen et al. proposed HUMHDT [17] algorithm to mine HUIs over data streams with a historical data table which significantly reduces the redundant candidates and improves the efficiency of the algorithm.

Closed high utility itemsets mining
CHUI is the concise and lossless representation of the HUI. Currently, many algorithms [16,18,19,[21][22][23] in the field of closed high utility itemsets mining use the design idea of DCI_CLOSED [31] algorithm. The DCI_CLOSED algorithm is one of the fastest algorithms in the field of closed frequent itemset mining, employing a vertical bitmap representation of the dataset and designing a divide-and-conquer approach to subdivide the search space. In addition, the algorithm designs an efficient strategy to identify duplicate itemsets and prune them during the mining process without storing the closed itemsets generated by previous mining in memory. The current closed high utility itemsets mining algorithms can be mainly divided into three types: two-phase based, subsumption checks and closure computations based and forward closure and backward closure checking based.
The CHUD [19] algorithm proposed by Tseng et al. is the first closed high utility pattern mining algorithm, extending from the DCI-Closed algorithm. It is a two-stage depth-first search-based algorithm, where the algorithm uses a modified DCI-Closed method to find candidates with utility values greater than TWU in the first stage and calculates the true utility of the itemsets by scanning the database in the second stage. The algorithm uses a DAHU method to recover the full set of high utility itemsets without scanning the original database. Since the CHUD algorithm is a two-phase algorithm, the algorithm generates a large number of candidates and needs to repeatedly scan the original database to calculate the actual utility of the itemsets, so its running time and memory consumption are high.
The algorithms that are based on subsumption checks and closure computations inherit the efficient search space traversal as well as closure computation methods of DCI-Closed algorithm to search all possible itemsets and efficiently identify and prune duplicate itemsets. Sahoo et al. proposed CHUM [32] algorithm to mine CHUIs in the transaction database and proposed a gutility-list structure to maintain the utility information of the itemset and heuristic pruning information to prune the search space. Wu et al. proposed CHUI-Miner [18] algorithm with an EU-List utility list structure to mine CHUIs, which is a one-phase algorithm without producing candidates. Since it only uses TWU and the remaining utility pruning strategy, it needs to perform a large number of costly join operations. Dam et al. proposed CLS-Miner [21] algorithm also uses the utility list structure. Moreover, it proposed three strategies: Chain-EUCP strategy, LBP strategy, and coverage concept to reduce the search space. Dam et al. proposed the IncCHUI [23] algorithm with an incremental utility list structure to mine CHUIs in a dynamic database. It only needs to scan the database once to build and update the incremental utility list and use a hash-based structure CHT to store CHUIs. Chen et al. proposed CHUI_DS [16] algorithm with a CH-List structure to mine CHUIs over data streams using the sliding window model. It uses a double hash table called BRU_table, which can easily update the remaining utility of the items in the CH-List after the window slides.
Fournier-Viger et al. proposed the EFIM-Closed [20] algorithm using database projection and transaction merging technology to reduce the cost of database scans and adopt forward and backward closure checking strategies with the closure-jumping strategy to effectively mine CHUIs. Nguyen et al. proposed Hminer-Closed [22] algorithm also uses forward closure and backward closure checking strategies with the closure-jumping strategy to prune nonclosed itemsets. It proposed an MCUL structure to store the information needed to mine CHUIs. It adopts LA-Prune and C-prune pruning techniques to reduce the number of join operations. The EFIM-Closed algorithm and the Hminer-Closed algorithm have proven to be one of the fastest algorithms to mine CHUIs.
Recently, researchers have proposed some new methods of mining closed high utility itemsets. Lin et al. proposed a decomposition-based compact genetic algorithm DcGA [33] to discover closed high utility itemsets in a limited time, which adopts a clustering model to divide the transactions into several groups based on their correlations that can help to make a better-trained machine-learning model. Pramanik et al. first proposed a CHUI-AC [34] algorithm based on the ant colony algorithm to mine CHUIs. It uses a bootstrap and routing graph to map the whole feasible search space to a directed acyclic graph. Lin et al. proposed MCUI-Miner [35] and develop a multi-objective k-means-based MapReduce architecture for mining CHUIs in large-scale datasets with a GA-based model to rapidly search itemsets. Hidouri et al. proposed SATCHUIM [36] algorithm based on propositional satisfiability for mining closed high utility itemsets. They represent a closed high utility itemset mining task as a CNF propositional formula and use the weighted clique cover problem to enhance the efficiency of the algorithm. Experimental results show the algorithm is competitive with state-of-the-art methods.

Differences from previous works
The proposed method extracts closed high utility patterns from data streams, which has several different points from IncCHUI [23], CHUI-DS [16], and other previous work dealing with dynamic databases.
High utility itemsets mining algorithm over dynamic database has the attribute of "Build once mine many property" [12], that is, during the operation of the algorithm, it is only necessary to initialize the data structure and the resultset storage structure once, and then perform multiple mining operations on it. Current algorithms, such as EIHI [26], SOHUPDS [15], IncCHUI [23], HUPMS [12], SHU-growth [14], etc., all implement a public resultset to store itemsets that the algorithm generated in the mining process and dynamically updated as the algorithm runs. For example, the EIHI algorithm uses a HUI-trie structure to save the resultset of HUIs, the SOHUPDS algorithm builds multiple Trie-tree structures to save the resultset, the IncCHUI algorithm uses a hash table structure CHT to save the resultset, the HUPMS algorithm designs a tree-structured HUS-tree to store the resultset, the SHU-growth algorithm designs a SHU-Tree structure to maintain the resultset. The proposed algorithm FCHM-Stream algorithm is the first to propose a skip-list based structure to save the resultset. The algorithm stores the CHUIs with the same support in a skip-list structure. In the same skip-list structure, the CHUIs sort in ascending order by length, and sort in lexicographical order if the length is the same.
Unlike the previous IncCHUI algorithm that mines CHUIs in incremental databases that only requires inserting new CHUIs after new transactions are inserted as well as increasing the utility and support of the resultsets, the proposed algorithm mines CHUIs over data streams based on the sliding window model that requires removing resultsets that are no longer high utility or closed, and inserting new CHUIs as the window slides, which is more complex than incremental mining.
Compared with the previous CHUI-DS algorithm which needs to use the utility list of all 1-itemsets during the mining process, the FCHM-Stream algorithm does not need to use all the utility lists after the window slides, only the utility list of items in the new batch is used. The number of items considered is less, which reduces the search space and significantly reduces the execution time. In terms of resultset reuse, the CHUI-DS algorithm does not consider the problem of resultset reuse, while the FCHM-Stream algorithm adopts the resultset reuse strategy based on the skip-list structure for the first time, which reduces the repeated generation of itemsets, thereby reducing the consumption of execution time.

FCHM_stream
We proposed an algorithm named FCHM-Stream, which is based on the sliding window model to quickly mine CHUIs over data streams. The algorithm first proposed a utility list structure CN-list based on batch division, then proposed a CN-list maintenance strategy and an RMS strategy to maintain the resultset using the skip-list structure, and finally introduced the recursive search mining algorithm.

CN-list structure
Since the window size of the sliding window model is fixed, for two adjacent windows, when window sliding is performed, the window needs to delete the oldest batch and insert the new batch, so there are a large number of communal batches between two adjacent windows. The data in the old window consists of the oldest batch as well as the communal batch, and the oldest batch is discarded when the window is slid. The batch data in the new window consists of the communal batch part and the new batch part.
For example, in the data stream shown in Fig. 1, window W 1 consists of batches B 1 , B 2 , and B 3 , and as the window slides, the old batch B 1 is deleted. Window W 2 consists of batches B 2 , B 3 , and B 4 . Thus, for windows W 1 and W 2 , batch B 1 is the oldest batch, batches B 2 and B 3 are the communal batches, and B 4 is the new batch. Therefore, the itemsets in window W 2 only need to hold their utility information for the communal batches B 2 , B 3 , and the new batch B 4 . In order to efficiently store the utility information of the itemsets in the window, we devised a CN-list utility list structure based on batch division, which stores the utility information of the itemsets including the communal batch and the new batch of the current window.
Definition 12 (CN-list utility list structure). Let C B is the communal batch part in the current window, and N B is the new batch part in the current window, then the current window is W = C B ∪ N B. The CN-list of the itemset X in the window W is denoted as ul(X ), which includes the utility information of the communal batch and the new batch in the window, denoted as ul C and ul N , respectively. The tuple in the CN-list structure is the same as the tuple in the utility list structure in Definition 11 and consists of three parts: the transaction identifier Tid, the utility of the itemset iutil, and the remaining utility rutil. Formally defined as: For example, the CN-list structures constructed for window 1 and window 2 in the example are shown in Fig. 2 and Fig. 3. Since there is no communal batch when the first window is considered, the CN-list of the itemset in window 1 has only the new batch part, i.e., the ul N part.
Based on the above definitions, we can get the following properties about the CN-list utility list.

Property 3
The sum of iutil. Let be an itemset X in the window W , the utility u(X) of the itemset X is equal to the sum of all the iutil values of the itemset X in the CN-list, that is u(X ) = ul(X ).ul C .iutil + ul(X ).ul N .iutil. If the utility of X is greater than or equal to the minimum utility threshold minutil, X is a high utility itemset. Otherwise, X is a low utility itemset.

Property 4
Remaining utility pruning based on CN-list. Similar to the remaining utility pruning strategy proposed by the HUI-Miner [6] algorithm. Let be an itemset X and an itemset Y which is an extension of X that can be obtained by appending an item y to X such that y i,∀i ∈ X . If the sum of iutil and rutil values in ul(X).ul C plus the sum of iutil and rutil values in ul(X).ul N is less than minutil, then X and its supersets are all low-utility itemsets.

Maintenance strategies
In this subsection, we introduce 2 strategies to explain how the global CN-list is constructed and updated and the resultset maintenance strategy(RMS).

Construction and update strategy of CN-list
We utilize the CN-list structure introduced in Sect. 4.1 to efficiently store and update the utility information of the itemsets in the sliding window. To process the data stream in real time, this paper proposes a CN-list construction and updating strategy, which uses one scan of the dataset to construct the CN-list of 1-itemsets to store the utility information of the items in the dataset.
The algorithm first initializes a global set of CN-list structures, called CNL. Since the algorithm is based on the sliding window model, as the window slides, the CN-lists of the 1-itemset need to update the utility information of the items in the window by deleting the utility information of items belonging to the old batch and inserting the utility information of items in the new batch, and dynamically update the TWU of the items. Therefore, the CN-list in CNL needs to be updated every time the window slides.
In constructing the CN-lists for the 1-itemset, the algorithm treats the first window differently from the other windows. For the first window, Algorithm 1 (lines 1-9) is called to scan the input transaction dataset, construct its CN-list for all items, initialize the utility value of the item in each transaction, initialize the remaining utility to zero, and add it to the CNL, while updating the TWU value of each item in the window. If the ul N part of the item in the CNL is empty, according to property 7, the item and its superset need not be searched in the subsequent search process, so the item is not added to the set of ULs of the items to be searched (Algorithm 1, lines 12-16). Algorithm 1 (lines 17-24) then sorts the CN-list of each item in the ULs according to the TWU order and updates the remaining utility for the CN-list of all 1-itemsets using the Ru array. For the other windows, each time the window slides, first use Algorithm 3 (lines 1-6) to remove the utility information in the CN-list belonging to the old batch and update the TWU of each item. then also call Algorithm 1 (lines  to scan the transactions in the new batch to build the CN-list for the items in the new batch. Subsequently, the CN-list of each item in the ULs is sorted according to the new TWU order, and finally, the remaining utilities are updated using the Ru array for the utility list of all 1-itemsets in the ULs. The construction of the CN-list of the k-itemset is shown in Algorithm 2. Since the CNlist is divided into two parts, when constructing the CN-list of k-itemsets, it is necessary to construct the part C and the part N separately. In the process of constructing k-itemsets, the algorithm uses property 5 (LA-prune pruning strategy) to reduce the number of CN-list constructions. The algorithm first constructs the N part of the CN-list, followed by the C part. And according to property 7, if part of N the CN-list is empty during construction, it is not necessary to build the CN-list of the itemset (Algorithm 2, lines 9-10).

Resultset maintenance strategy
This section proposes a resultset maintenance strategy (abbreviated as RMS) using the skiplist structure and elaborates how the strategy maintains the CHUIs in the sliding window.
Since the size of the sliding window is fixed, there are a large number of communal batches between two adjacent windows. After experimental statistics, a large number of communal batches lead to a large number of identical itemsets in the resultsets of two adjacent windows, and the algorithm will consume a lot of time to repeatedly mine these identical itemsets. Therefore, a resultset maintenance strategy(RMS) based on the skip-list structure is proposed, which can effectively maintain and update the resultset, thereby effectively reducing the workload of the mining algorithm and improving the performance of the algorithm.
The RMS strategy uses a collection of skip-list structures to hold the mined resultset. As shown in Fig. 4, itemsets with the same support are stored in the same skip-list structure, and in the skip-list structure, these itemsets are sorted in ascending order by length, and itemsets of the same length are sorted in lexicographical order. In addition, the itemset stored in the skip-list structure includes the Tid and iutil of the transaction of the itemset in the current window, so that when the old batch is deleted, the support and the utility value of the itemset can be updated conveniently.
Following the sliding window sliding process, the strategy consists of two parts, maintaining the itemsets in the resultset that remains closed and high utility in the communal batch after deleting the old batch and maintaining the utility and support of the itemsets in the resultset after inserting the new batch. When the process of mining the CHUIs in a window is completed, the algorithm stores the CHUIs in the window. The itemsets in the resultset can be classified into three cases: (1) The itemsets only exist in the old batch that will be deleted; (2) The itemsets only exist in the communal batch between this window and the next window; (3) The itemsets exist in both the old batch and the communal batch.
As the window slides, the old batch needs to be deleted. For case 1 and case 3, the RMS strategy uses Algorithm 3 (lines 7-9) to remove the utility and support of the itemsets belonging to the old batch by traversing the resultset and to remove the low utility itemsets to ensure that these itemsets remain high utility in the communal batch. Then, using Algorithm 3 (lines 10-12), the resultset is traversed to check whether the itemset is closed to ensure that these itemsets remain closed in the communal batch. After the above steps, all the itemsets in the current resultset satisfy case 2, i.e., the itemsets in the resultset are CHUIs in the communal batch. According to property 6, the CHUIs in the communal batches remain closed and high utility in the new window after the insertion of new batches. After that, the algorithm scans the new batch and generates new CHUIs, and updates the utility and support of the newly generated CHUIs if it exists in the resultset, or inserts it into the skip-list structure corresponding to its support if it does not exist (Algorithm 5, lines 19-23).

Property 6
If the itemset X is a CHUI in the communal batch CB, after inserting the new batch NB, the itemset X remains a CHUI in the new window W new = C B ∪ N B.
Proof Let the itemset X belongs to the communal batch CB and X is a CHUI. According to Definition 6 and Definition 9, the support of the itemset Y is smaller than the support of the itemset X for any superset Y of X in CB. After inserting a new batch N B, the itemset X may or may not exist in the NB in the new window W new = C B ∪ N B. If X / ∈ N B, then Y / ∈ N B, so the support and utility values of itemset X and itemset Y remain unchanged in W new . Therefore, itemset X is still a CHUI in the new window W new . If X ∈ N B, the support and utility of X will increase after the insertion of NB, and the support of the itemset X is greater than the support of its superset Y whether Y exists in NB or not, and the utility of the itemset X is still greater than the minimum utility threshold after the increase, so the itemset X is still a CHUI in the new window W new . In summary, if the itemset X belongs to the communal batch CB and X is a CHUI, after inserting the new batch NB,X is still a CHUI in the window W new = C B ∪ N B.

Property 7
For itemset X ∈ W new , if the N B part of the CN-list of an itemset X is empty, neither itemset X nor its supersets need to be explored.
Proof If the N B part of the CN-list of the itemset X is empty, it means that the itemset X does not exist in the new batch of the current window W new , the support and utility of the itemset X and its supersets are only related to their utility information in the communal batch. According to the RMS strategy, if the itemset X only exists in the communal batch and it does not exist in the new batch, the itemset X has been saved in the resultset, and the support and utility of X and its supersets will not be changed after the insert of the new batch. Therefore, neither the itemset nor its supersets need to be re-explored during the construction of the 1-itemsets and k-itemsets.

FCHM_stream algorithm
The overall process of the algorithm proposed in this paper is shown in Fig. 5. As a new batch arrives, the algorithm continuously inserts the latest batch into the sliding window. Whenever the number of batches in the window is equal to the window size, the algorithm determines Fig. 5 The overall process of FCHM-Stream whether it is the first window and processes the first window as well as the other windows separately. As shown in the pseudo-code of Algorithm 4. First, the algorithm initializes the array structure that stores the sliding window, the data in the new batch, the set of the CN-list structure proposed in Sect. 4.1, the result set, and the variables that record the oldest batch and the number of batches. The algorithm then adds the transactions from the data stream to the new batch array. When the number of transactions in the new batch array is equal to the batch size, the array is added to the window array and the number of batches is counted. When the number of batches equals the window size, Algorithm 1 is called using Algorithm 4 (line 17) to mine the first sliding window for the set of CHUIs.
For the first window, the algorithm uses Algorithm 1 (lines 1-24) to scan the input transactions once, build the CN-list structure for all items in the transaction, and add them to the CNL, while recording TWU for each item. When the transaction scan is completed, the CN-list in the CNL is sorted in ascending TWU order, and then the remaining utility is calculated for all CN-list in the CNL. Then, call Algorithm 5 to recursively mine all CHUIs in the current window. As it is the first window, the algorithm does not need to check whether the itemsets exist in the resultset, and directly inserts the closed high utility itemsets into the corresponding skip-list. Then the window slides to delete the oldest batch. Finally, Algorithm 1 (lines [26][27][28][29] is called to merge the data of the ul N of all CN-lists into the ul C and set the ul N to be empty to prepare for storing the data in the new batch.
For the subsequent window, Algorithm 4 (line 13) is called to first calculate the oldest batch number, and then Algorithm 3 is executed to process the old batch data, delete the utility information of all CN-lists of old batches in the CNL, and update the TWU of each item, delete all itemsets in the resultset that have low utility or are no longer closed due to the deletion of old batches, and update the itemsets that remain closed to the skip-list corresponding to their support. Algorithm 1 (lines [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] is then executed to scan only the new batch and update the CNL, then Algorithm 5 is invoked to mine the resultset for CHUIs, updating their utility and support if they already exist in the resultset and storing them to the skip-list corresponding to their support, otherwise insert them into the resultsets.

Property 9 [18] Let be itemset X and item
For Algorithm 5, the process is similar to the CHUI-miner algorithm [18]. The input of the algorithm is the itemset P, the two sets of items PreSet(P) and PostSet(P), and the minimum utility threshold minutil. The whole process outputs all CHUIs including itemset P. The overall steps of the search process are as follows, for each item x in PostSet(P), the algorithm constructs the itemset Px = P ∪ x and its CN-list ul(Px). According to property 4, if Px satisfies the sum of iutil and rutil values in ul(Px).ul C plus the sum of iutil and rutil values in ul(Px).ul N is greater than minutil, then Px is a potential candidate. Then, according to Definition 13 and Property 9, the algorithm performs the isSubsumedCheck procedure to check whether the previously mined closed high utility itemsets subsume Px. If it is subsumed, it is not necessary to explore Px. Otherwise, according to Property 9, the larger itemset Pxy is obtained by extending Px with item y in PostSet(P), such that y x ∧ T id Set( px) ⊆ T id Set(y). And uses Algorithm 2 to build the CN-list of Pxy. According to Property 7, during constructing the CN-list of Pxy, it is necessary to judge whether the ul N of the CN-List of Pxy is empty and if it is empty, there is no need to construct the CN-List for Pxy. Finally, according to Property 3, it is judged whether the utility of Pxy is greater than minutil. If the condition is satisfied, check whether Pxy exists in the resultset. If it does not exist, insert it in the resultset, and if it exists, update its utility and support.

Example of FCHM-stream
This section gives an example of the FCHM-Stream to describe the specific operation process of the algorithm. Taking the data stream example in Fig. 1 as an example, set the winsize to 3 (the window contains 3 batches), the batchsize to 2 (the batch contains two transactions), and the minimum utility threshold minutil to 20.
First, call Algorithm 4 (lines 1-5) to initialize the window array that holds the data in the window and the newBatch array that holds the new batch data, and initialize their sizes to 3 and 2, respectively. The algorithm then initializes the CNL that holds the global CN-list, the resultSet array that holds the resultsets, the oldestBatch variable that records the oldest batch, and the batchCount that count the number of batches. The algorithm first reads 3 batches and inserts them into the window. At this time, batchCount is 3, which is equal to the winsize. The algorithm runs the mining process of the first window. The algorithm calls Algorithm 1 (lines 1-9) to build the CN-list for the items in the first window and record the TWU for each item. After the window scan is over, the TWU of the items in window 1 is shown in    Fig. 2. Then, Algorithm 5 is called to recursively mine all closed high utility itemsets in window 1 and add them to the resultset based on the skip-list structure. The resultset in window 1 is shown in Table 3.
Then the algorithm deletes the data of the oldest batch (batch 1) from the window and reads the new batch data (batch 4), and the window slides. Since the current batchCount is 4 larger than the winsize, the algorithm performs the mining process of the subsequent window. Use Algorithm 4 (lines [13][14] to calculate the batch number(oldestBatch) of the oldest batch (batch 1), and call Algorithm 3, first update the CN-list and its TWU of each item, and then delete the utility information of old batch in the resultset and delete the low utility itemsets and no longer closed itemsets in the resultset. Subsequently, Algorithm 1 (lines 1-9) is called to build the CN-list structure of the items in the new batch and recalculate the TWU of items, and the TWU of the items in window 2 are shown in Table 4. Use Algorithm 1 (lines [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24] to sort the CN-lists in CNL in ascending TWU order, and calculate the remaining utility of all CN-lists in reverse order. According to Property 7, since item d does not exist in the new batch, there is no need to search the CN-list of item d during the search process. The constructed CNL is shown in Fig. 3. Then continue to call Algorithm 5 to mine the closed high utility itemset in window 2. If the itemset already exists in the resultset, update its utility and support, and update it to the skip-list structure corresponding to its support. If it does not exists in the resultset, it will be inserted into the corresponding skip-list structure. The resultset in window 2 is shown in Table 5. By analyzing the resultsets of the two windows, it can be found that 12 itemsets and 11 itemsets are generated in the two windows, respectively. Itemset (a, b, c, d, e, f) with support of 1 in window 1, itemsets (b, d, f) (c, d, e) (a, c, e, f) with support of 2 satisfy case 2, only exist in the communal batch in the two windows, and does not exist in the new batch, so it Table 4 TWU values for Window  2  Item  d  b  c  e  a  f   TWU  169  179  189  189 199 207 is still closed high utility itemsets in window 2, and their support and utility values remain unchanged. Itemset (d) with support 4 in window 1 satisfy case 3, and exists in both the old batch and the communal batch and does not exist in the new batch. After deleting the old batch using Algorithm 3, its support and utility values decrease, but it remains closed and high utility. During the mining process of the window, these 5 itemsets do not need to be mined repeatedly. Therefore, among the 11 itemsets in window 2, 5 itemsets do not need to be mined repeatedly, and the algorithm uses the RMS strategy to reduce the workload by 45%. The itemsets (c, e) with support degree 3 in window 1 satisfy case 2 and exist in the communal batch of the two windows, but it also exists in the new batch, and their support and utility need to be increased. In window 2, its support is increased to 4. In addition to the above 6 itemsets in window 1, the remaining 6 itemsets are determined in Algorithm 3 that these itemsets are low utility or not-closed with the deletion utility of the old batch, and are deleted from the resultset. In window 2, except for the above-mentioned 5 not changed itemsets and 1 updated itemset, the rest of the itemsets are generated from the new batch.

Complexity of FCHM-stream
In this subsection, we analyze the complexity of the proposed FCHM-Stream algorithm, which mainly includes two parts: the construction and update procedure for CN-list, and the recursive search procedure.
(1) Construction and update procedure for CN-list The algorithm constructs or updates the CN-list set CNL of 1-itemset and records the TWU of the items by scanning the database once. Thus, the time complexity is O(n W in × T avg ) or O(n B × T avg ), where n W in , n B , and T avg is the number of transactions in the window, the number of transactions in a batch, and the average length of transactions, respectively. Let m be the number of items in the sliding window. The time complexity of sorting CNL in ascending TWU order is O(m log(m)). The time complexity of updating the remaining utility of the CNL in reverse order in the first window or new batch is O(n W in × T avg ) or O(n B × T avg ). Therefore, the total time complexity of CN-list construction and update procedure is O(n W in × T avg + m log(m) + n W in × T avg ) or O(n N B × T avg + m log(m) + n B × T avg ), which can be simplified as O(n W in × T avg + m log(m)) or O(n B × T avg + m log(m)).
(2) Search process The algorithm recursively calls Algorithm 5 Search() to search CHUIs in the current window. The time complexity of the search process is proportional to the number of the IsSubsumedCHeck() process calls as well as the number of candidates [17]. In the worst case, the algorithm does not prune any candidates, and the search space contains all items in the window, then 2 m − 1 itemsets in the search space will be considered, so the worst time complexity is O(2 m − 1) [23]. Therefore, the worst time complexity of the FCHM-Stream is O(2 m − 1), which is proportional to the number of candidates considered by the algorithm in the search space [23], so an efficient pruning strategy can greatly improve the execution time of the algorithm.

Performance evaluations
In this subsection, a large number of comparative experiments are designed to evaluate the spatio-temporal performance of the proposed algorithm. The proposed algorithm is compared with six current state-of-the-art closed high utility pattern mining algorithms, and the spatiotemporal performance of the proposed algorithm is evaluated in different situations by varying the minimum utility threshold, varying the window size, and varying the batch size.

Experiment setup
All experiments were carried out on a computer equipped with a 64-bit Xeon(R) Gold 6154, 3.00 GHz Intel Processor and 256 GB of main memory, running Ubuntu 16.04.6 LTS as the operating system. All algorithms were implemented by extending the SPMF open-source java library platform and ran on JDK 1.8, where CHUI-Miner [18], EFIM_Closed [20] were obtained from SPMF, IncCHUI [23] were obtained from Github, CHUI_DS [16] were obtained from the author, and we implemented CLS-Miner [21], HMiner-Closed [22].
To analyze the performance of the proposed algorithm in different situations, experiments were performed on six different benchmark datasets. The characteristics of the datasets used in the experiments are given in Table 6, where #Trans, #Tavg(A), #Items(I), and #Density indicate the number of transactions, the average transaction length, the number of distinct items, and the density of the dataset(A/I), respectively. The Mushroom, Connect, Retail, and BMS datasets were obtained from the FIMI Repository, the Chainstore dataset was obtained from NUMineBench 2.0, and the Foodmart dataset was obtained from the Microsoft FoodMart 2000 database. Except for the Chainstore and Foodmart datasets, the rest of the datasets have no actual utility value, so the internal utility is randomly generated between [1,5] and the external utility is generated [1,10] at intervals using a log-normal distribution according to the method adopted in the previous work [23].

Performance evaluations with various minimum utility thresholds
Since there are few closed high utility pattern mining algorithms over the data streams, we test the performance of the FCHM-Stream algorithm by varying the minimum utility threshold both in the static database and data streams, respectively. The FCHM-Stream algorithm was compared with the previous state-of-the-art algorithms for mining CHUIs, which are CHUI-Miner [18], EFIM_Closed [20], CLS-Miner [21], HMiner-Closed [22], IncCHUI [23], and CHUI_DS [16] algorithms. The CHUI-DS algorithm and the FCHM-Stream algorithm ran in a single-window mode (set the window size to 1, and set the batch size to the number of transactions in the dataset). Figure 6 shows the execution time of seven algorithms in the static data environment. With the increase in the minimum utility threshold, the number of CHUIs and candidates generated by the algorithm gradually decreases, so the execution time of all seven algorithms gradually decreases. The CLS-Miner algorithm was the previous state-of-the-art algorithm for mining closed high utility itemsets on sparse datasets. On sparse datasets Chainstore and Retail, FCHM-Stream is the fastest, which is up to 28% and 37% faster than CLS-Miner, respectively. The proposed algorithm is the fastest algorithm except for HMiner-Closed on the BMS dataset and the fastest algorithm except for EFIM-Closed on the Foodmart dataset, and the proposed algorithm is 69% and 58% faster than the CLS-Miner algorithm on these two datasets, respectively. Compared with the CLS-Miner algorithm, the proposed algorithm uses not only the EUCS pruning strategy but also the LA-Prune pruning strategy in the static dataset, which can prune more search space and avoid generating too many candidates and greatly reduce the number of join operations of the utility list and thus reduce the execution time of the algorithm. The EFIM-Closed algorithm and the Hminer-Closed algorithm were the previous state-of-the-art algorithms for mining closed high utility itemsets Fig. 6 Running time comparison in the static database on dense datasets. On the Connect and Mushroom datasets, HMiner-Closed and EFIM-Closed have the highest performance, respectively. And FCHM-Stream is next only to them. On the Mushroom dataset, the HMiner-Closed algorithm performs poorly when the minimum utility threshold is small and has the best performance when the threshold is large. On the Connect dataset, the EFIM-Closed algorithm performs poorly when the minimum utility threshold is small and has the best performance when the threshold is large. The EFIM-Closed and HMiner-Closed algorithms both used forward closure checking and backward closure checking strategies to prune non-closed itemsets, and both adopted the strategy of closure jumping to improve the performance of the algorithm. The experimental results show that both the projection-based method adopted by the EFIM-Closed algorithm and the MCUL-based method adopted by HMiner-Closed have higher performance on dense datasets. In summary, the proposed algorithm prunes the search space by LA-prune as well as EUCS pruning strategy, which has the best performance on sparse datasets and is second only to previous state-of-the-art EFIM-Closed algorithm and HMiner-Closed algorithm on dense datasets in terms of running time.
In Fig. 7, we compared the memory consumption of seven algorithms on the static databases. On sparse datasets BMS and Foodmart, the proposed FCHM-Stream algorithm consumes the least memory. On sparse datasets Chainstore and Retail, the proposed algorithm consumes more memory than the EFIM-Closed and CHUI-Miner algorithms because of using the EUCS structure on the sparse dataset. CLS-Miner algorithm not only needs to build the EUCS structure but also requires to store coverage of items, so it consumes more memory. On dense datasets Connect and Mushroom, the FCHM-Stream is the algorithm that consumes the least memory except for the CHUI-Miner algorithm and the CLS-Miner algorithm. The reason is that FCHM-Stream adopts the resultset maintenance strategy, so it Then, the proposed algorithm is compared with the previous state-of-the-art CHUI-DS algorithm over the data stream. Since both FCHM-Stream and CHUI-DS algorithms are based on the sliding window model, both of them have two parameters: window size and batch size. Therefore, the window size and batch size of both algorithms are fixed in the experiments, and the minimum utility threshold is varied to perform comparative analysis. The comparisons are performed on the sparse dataset Chainstore and Retail, and the dense dataset Connect and Mushroom, respectively. The window size for Chainstore, Retail, Connect, and Mushroom is set to 7, 5, 3, and 4, respectively. The batch size for Chainstore, Retail, Connect, and Mushroom is set to 100 k, 10,000, 10,000, and 1000, respectively.
Since the utility-list-based algorithm needs to construct all candidates through costly utility-list join operations during the search process. Therefore, the number of join operations of the utility list structure determines the spatio-temporal efficiency of the algorithm, the more the number of join operations, the worse the spatio-temporal efficiency of the algorithm [21]. In addition, to evaluate the proposed resultset maintenance strategy(RMS), a new evaluation metric is devised: the Average Resultset Reuse Rate(abbreviated as ARR), which is used to record the probability of average resultset reused during the whole algorithm operation. The higher ARR, the fewer new CHUIs needs to repeat generation when the window slides, the fewer time-consuming join operations are executed and the less time is consumed. The number of join operations(abbreviated as Joincount), ARR, memory consumption, and the number of patterns(abbreviated as Patterncount) generated by the proposed algorithm with the CHUI-DS algorithm on the four datasets are shown in Table 7.
As shown in Table 7, the number of patterns generated by both algorithms as well as the number of join operations decreases continuously as the minimum utility threshold increases on all four datasets, and thus the running time of both algorithms can be observed to decrease continuously in Fig. 8. By observing the ARR value in Table 7, it can be seen that the ARR of Chainstore, Retail, Mushroom, and Connect is up to 4.1%, 9.2%, 36%, and 0%, respectively. Since the proposed algorithm uses the RMS strategy to reuse the resultset and adopts the LA-Prune pruning strategy, the proposed algorithm has a much smaller number of join operations than the CHUI-DS algorithm on two sparse datasets, Chainstore, and Retail, and the number of join operations is about 30% of CHUI-DS algorithm on dense dataset Mushroom, and the number of join operations is reduced less on the Connect dataset, which is about 97% of the CHUI-DS algorithm. Since the proposed algorithm has a greater advantage in terms of resultset reuse as well as the number of join operations on Chainstore, Retail, and Mushroom datasets. Therefore, as shown in Fig. 8, the proposed algorithm has a significant reduction in runtime compared to the CHUI-DS algorithm, up to 71.9% on the Chainstore dataset, up to 65.9% on the Retail dataset, and up to 21.8% on the Mushroom dataset. Due to the lower ARR value and less reduction in the number of join operations on the Connect dataset, the time consumption is reduced by at most 2.4% on the Connect dataset compared to the CHUI-DS algorithm.
As the utility threshold increases, the number of patterns generated by the algorithm and the number of join operations to be performed decreases. So the proposed algorithm and the CHUI-DS algorithm consume less memory space overall. Since the proposed algorithm uses the RMS strategy based on a skip-list structure, which requires the construction of multi-level indexes for each node to achieve fast seeking, and each node stores the TidSet and utility information of the itemset, the proposed algorithm consumes relatively more memory space.

Performance evaluations with various window sizes
Then, we evaluated FCHM-Stream with the CHUI-DS algorithm under various window sizes with batch size and minimum utility threshold fixed. The experiments are compared on the sparse dataset Chainstore as well as Retail, and the dense dataset Connect as well as Mushroom, respectively.
The minimum utility threshold for Chainstore, Retail, Connect, and Mushroom is fixed as 1000 k, 5000, 10000 k, and 5000, respectively. The batch size for Chainstore, Retail, Connect, and Mushroom is set to 100 k, 10,000, 10,000, and 1000, respectively.
As shown in Table 8, as the window size increases, the transactions in each sliding window increase, and the number of join operations and the number of patterns generated gradually increase, so the running time of the algorithm gradually increases as shown in Fig. 9. On the four datasets, as shown in Table 8, the ARR value is up to 3.9% on the Chainstore and up to 9.4% on the Retail. The ARR is up to 43.3% on Mushroom and 0% on Connect dataset for the proposed algorithm. Due to the effective LA-Prune pruning strategy and RMS strategy, in terms of the number of join operations, the proposed algorithm is much smaller than the CHUI-DS algorithm on two sparse datasets, Chainstore, and Retail, and about 33% of the CHUI-DS algorithm on Mushroom. The number of join operations on the Connect dataset is about 97% of that of the CHUI-DS algorithm. It can be noted that on the Chianstore when the window size is 4 and 5, the number of join operations is even smaller than the number of patterns generated by the algorithm. Since two algorithms are based on the utility list structure that needs to construct the candidates by join operations. By using the RMS strategy, it can effectively avoid the algorithm from repeatedly performing join operations to produce the same resultset, thus it can reduce the number of join operations that the algorithm needs to perform. As shown in Fig. 9, the proposed algorithm reduces the running time consumption up to 74.6%, 59.7%, and 25.2% on the Chainstore, Retail, and Mushroom datasets, respectively, due to the smaller number of join operations on these two datasets. In Connect, due to its ARR value is 0% and only about 3% reduction in the number of join operations, its runtime is reduced by only 7.6% at the highest. In addition, as the ARR is higher, the number of join operations is less, and the running time gap of the algorithm is larger. In terms of memory consumption, as the window size increases, the number of transactions in the sliding window increases, the number of generated patterns and the number of join operations gradually increase, and the proposed algorithm needs to store and maintain the resultset in the window sliding process by using RMS strategy, so the memory consumption gradually increases, and it consumes more memory than CHUI-DS algorithm.

Performance evaluations with various batch sizes
This subsection tests the performance of the algorithm by varying the batch size by fixing the window size and the minimum utility threshold.
The minimum utility threshold for Chainstore, Retail, Connect, and Mushroom is fixed as 1000 k, 5000, 10000 k, and 5000, respectively. The window size for Chainstore, Retail, Connect, and Mushroom is set to 6, 4, 3, and 4, respectively.
As shown in Table 9, as the batch size increases, the number of transactions contained in the sliding window increases, and the number of generated patterns, as well as the number of join operations, basically increases accordingly, so the running time of both algorithms gradually increases. The highest ARR of the proposed algorithms on the four datasets is 3.1% on Chainstore, 12.4% on Retail, 35.8% on Mushroom, and 0% on Connect. The number of join operations on two sparse datasets, Chainstore, and Retail, is much smaller than the CHUI-DS algorithm, and the number of join operations on the Mushroom dataset is about  33.6% of the CHUI-DS algorithm, and the number of join operations on Connect dataset is about 96.9% of the CHUI-DS algorithm. As shown in Fig. 10, the proposed algorithm reduces the runtime up to 75.2% and 58.1% on the Chainstore as well as Retail datasets, up to 24.7% on the Mushroom dataset, and only up to 9% on the Connect dataset. The proposed algorithm consumes slightly more time than CHUI-DS for larger batch sizes since the proposed algorithm has less reduction in ARR values and the number of join operations on the Connect dataset. Overall, the proposed algorithm performs better on sparse datasets because the data density is lower on sparse datasets and the proposed algorithm only needs to construct the CN-list of the items in the latest batch. Compared with the CHUI-DS algorithm which needs to use the items in the whole window, the proposed algorithm needs to search fewer items and therefore search space is less, so it performs better on sparse datasets. On the dense dataset, it can be found that the higher the ARR value, by comparing the running time of the algorithm on Mushroom and Connect datasets, the larger the difference between the running time of the proposed algorithm and the comparison algorithm.
In terms of memory, the more the number of patterns generated by the algorithm, the more the number of join operations the algorithm needs to perform, and the more memory space resultsets. The resultset maintenance strategy can be used in scenarios such as high utility pattern mining or high utility sequential pattern mining over data streams.