An Analysis System for Whole Genomic Sequence Using String B - Tree


The KIPS Transactions:PartA, Vol. 8, No. 4, pp. 509-516, Dec. 2001
10.3745/KIPSTA.2001.8.4.509,   PDF Download:

Abstract

As results of many genome projects, genomic sequences of many organisms are revealed. Various methods such as global alignment, local alignment are used to analyze the sequences of the organisms, and k -mer analysis is one of the methods for analyzing the genomic sequences. The k -mer analysis explores the frequencies of all k-mers or the symmetry of them where the k -mer is the sequenced base with the length of k. However, existing on-memory algorithms are not applicable to the k -mer analysis because a whole genomic sequence is usually a large text. Therefore, efficient data structures and algorithms are needed. String B-tree is a good data structure that supports external memory and fits into pattern matching. In this paper, we improve the string B-tree in order to efficiently apply the data structure to k -mer analysis, and the results of k -mer analysis for C. elegans and other 30 genomic sequences are shown. We present a visualization system which enables users to investigate the distribution and symmetry of the frequencies of all k -mers using CGR (Chaotic Game Representation). We also describe the method to find the signature which is the part of the sequence that is similar to the whole genomic sequence.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. H. Choi and H. G. Cho, "An Analysis System for Whole Genomic Sequence Using String B - Tree," The KIPS Transactions:PartA, vol. 8, no. 4, pp. 509-516, 2001. DOI: 10.3745/KIPSTA.2001.8.4.509.

[ACM Style]
Jeong Hyeon Choi and Hwan Gue Cho. 2001. An Analysis System for Whole Genomic Sequence Using String B - Tree. The KIPS Transactions:PartA, 8, 4, (2001), 509-516. DOI: 10.3745/KIPSTA.2001.8.4.509.