An Update-Efficient, Disk-Based Inverted Index Structure for Keyword Search on Data Streams


KIPS Transactions on Software and Data Engineering, Vol. 5, No. 4, pp. 171-180, Apr. 2016
10.3745/KTSDE.2016.5.4.171,   PDF Download:

Abstract

As social networking services such as twitter become increasingly popular, data streams are widely prevalent these days. In order to search data accumulated from data streams efficiently, the use of an index structure is essential. In this paper, we propose an update-efficient, disk-based inverted index structure for efficient keyword search on data streams. When new data arrive at the data stream, the index needs to be updated to incorporate the new data. The traditional inverted index is very inefficient to update in terms of disk I/O, because all index data stored in the disk need to be read and written to the disk each time the index is updated. To solve this problem, we divide the whole inverted index into a sequence of inverted indices with exponentially increasing size. When new data arrives, it is first inserted into the smallest index and, later, the small indices are merged with the larger indices, which leads to a small amortize update cost for each new data. Furthermore, when indices stored in the disk are merged with each other, we minimize the disk I/O cost incurred for the merge operation, resulting in an even smaller update cost. Through various experiments, we compare the update efficiency of the proposed index structure with the previous one, and show the performance advantage of the proposed structure in terms of the update cost.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
E. J. Park and K. Y. Lee, "An Update-Efficient, Disk-Based Inverted Index Structure for Keyword Search on Data Streams," KIPS Transactions on Software and Data Engineering, vol. 5, no. 4, pp. 171-180, 2016. DOI: 10.3745/KTSDE.2016.5.4.171.

[ACM Style]
Eun Ju Park and Ki Yong Lee. 2016. An Update-Efficient, Disk-Based Inverted Index Structure for Keyword Search on Data Streams. KIPS Transactions on Software and Data Engineering, 5, 4, (2016), 171-180. DOI: 10.3745/KTSDE.2016.5.4.171.