Finding Frequent Itemsets Over Data Streams in Confined Memory Space


The KIPS Transactions:PartD, Vol. 15, No. 6, pp. 741-754, Dec. 2008
10.3745/KIPSTD.2008.15.6.741,   PDF Download:

Abstract

Due to the characteristics of a data stream, it is very important to confine the memory usage of a data mining process regardless of the amount of information generated in the data stream. For this purpose, this paper proposes the Prime pattern tree(PPT) for finding frequent itemsets over data streams with using the confined memory space. Unlike a prefix tree, a node of a PPT can maintain the information necessary to estimate the current supports of several itemsets together. The length of items in a prime pattern can be reduced the total number of nodes and controlled by split_delta Sδ. The size and the accuracy of the PPT is determined by Sδ. The accuracy is better as the value of Sδ is smaller since the value of Sδ is large, many itemsets are estimated their frequencies. So it is important to consider trade-off between the size of a PPT and the accuracy of the mining result. Based on this characteristic, the size and the accuracy of the PPT can be flexibly controlled by merging or splitting nodes in a mining process. For finding all frequent itemsets over the data stream, this paper proposes a PPT to replace the role of a prefix tree in the estDec method which was proposed as a previous work. It is efficient to optimize the memory usage for finding frequent itemsets over a data stream in confined memory space. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. J. Kim, S. J. Shin, W. S. Lee, "Finding Frequent Itemsets Over Data Streams in Confined Memory Space," The KIPS Transactions:PartD, vol. 15, no. 6, pp. 741-754, 2008. DOI: 10.3745/KIPSTD.2008.15.6.741.

[ACM Style]
Min Jung Kim, Se Jung Shin, and Won Suk Lee. 2008. Finding Frequent Itemsets Over Data Streams in Confined Memory Space. The KIPS Transactions:PartD, 15, 6, (2008), 741-754. DOI: 10.3745/KIPSTD.2008.15.6.741.