Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences


The KIPS Transactions:PartD, Vol. 15, No. 2, pp. 155-162, Apr. 2008
10.3745/KIPSTD.2008.15.2.155,   PDF Download:

Abstract

Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process.However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
T. H. Kang and J. S. Yoo, "Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences," The KIPS Transactions:PartD, vol. 15, no. 2, pp. 155-162, 2008. DOI: 10.3745/KIPSTD.2008.15.2.155.

[ACM Style]
Tae Ho Kang and Jae Soo Yoo. 2008. Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences. The KIPS Transactions:PartD, 15, 2, (2008), 155-162. DOI: 10.3745/KIPSTD.2008.15.2.155.