Genome Analysis Pipeline I/O Workload Analysis


KIPS Transactions on Software and Data Engineering, Vol. 2, No. 2, pp. 123-130, Feb. 2013
10.3745/KTSDE.2013.2.2.123,   PDF Download:

Abstract

As size of genomic data is increasing rapidly, the needs for high-performance computing system to process and store genomic data is also increasing. In this paper, we captured I/O trace of a system which analyzed 500 million sequence reads data in Genome analysis pipeline for 86 hours. The workload created 630 file with size of 1031.7Gbyte and deleted 535 file with size of 91.4GByte. What is interesting in this workload is that 80% of all accesses are from only two files among 654 files in the system. Size of read and write request in the workload was larger than 512KByte and 1Mbyte, respectively. Majority of read write operations show random and sequential patterns, respectively. Throughput and bandwidth observed in each processing phase was different from each other.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
K. Y. Lim, D. O. Kim, H. Y. Kim, G. H. Park, M. S. Choi, Y. J. Won, "Genome Analysis Pipeline I/O Workload Analysis," KIPS Transactions on Software and Data Engineering, vol. 2, no. 2, pp. 123-130, 2013. DOI: 10.3745/KTSDE.2013.2.2.123.

[ACM Style]
Kyeong Yeol Lim, Dong Oh Kim, Hong Yeon Kim, Gee Han Park, Min Seok Choi, and You Jip Won. 2013. Genome Analysis Pipeline I/O Workload Analysis. KIPS Transactions on Software and Data Engineering, 2, 2, (2013), 123-130. DOI: 10.3745/KTSDE.2013.2.2.123.