A Striped Checkpointing Scheme for the Cluster System with the Distributed RAID


The KIPS Transactions:PartA, Vol. 10, No. 2, pp. 123-130, Jun. 2003
10.3745/KIPSTA.2003.10.2.123,   PDF Download:

Abstract

This paper presents a new striped checkpointing scheme for serverless cluster computers, where the local disks are attached to the cluster nodes collectively form a distributed RAID with a single I/O space. Striping enables parallel I/O on the distributed disks and staggering avoids network bottleneck in the distributed RAID. We demonstrate how to reduce the checkpointing overhead and increase the availability by striping and staggering dynamically for communication intensive applications. Linpack HPC Benchamark and MPI programs are applied to these checkpointing schemes for performance evaluation on the 16-nodes cluster system. Benchmark results prove the benefits of the striped checkpointing scheme compare to the existing schemes, and these results are useful to design the efficient checkpointing scheme for fast rollback recovery from any single node failure in a cluster system.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
Y. S. Chang, "A Striped Checkpointing Scheme for the Cluster System with the Distributed RAID," The KIPS Transactions:PartA, vol. 10, no. 2, pp. 123-130, 2003. DOI: 10.3745/KIPSTA.2003.10.2.123.

[ACM Style]
Yun Seok Chang. 2003. A Striped Checkpointing Scheme for the Cluster System with the Distributed RAID. The KIPS Transactions:PartA, 10, 2, (2003), 123-130. DOI: 10.3745/KIPSTA.2003.10.2.123.