A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers


The KIPS Transactions:PartA, Vol. 13, No. 7, pp. 607-614, Dec. 2006
10.3745/KIPSTA.2006.13.7.607,   PDF Download:

Abstract

The checkpointing schemes should reduce the process delay through managing the checkpoints of each node to fit the network load to enhance the performance of the process running on the cluster system that write the checkpoints into its global stable storage. For this reason, a cluster system with single IO space on a distributed RAID chooses a suitable checkpointng scheme to get the maximum IO performance and the best rollback recovery efficiency. In this paper, we improved the striped checkpointing scheme with dynamic stripe group size by adapting to the network bandwidth variation at the point of checkpointing. To analyze the performance of the multi striped checkpointing scheme, we applied Linpack HPC benchmark with MPI on our own cluster system with maximum 512 virtual nodes. The benchmark results showed that the multistriped checkpointing scheme has better performance than the striped checkpointing scheme on the checkpoint writeing efficiency and rollback recovery at heavy system load.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
Y. S. Chang, "A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers," The KIPS Transactions:PartA, vol. 13, no. 7, pp. 607-614, 2006. DOI: 10.3745/KIPSTA.2006.13.7.607.

[ACM Style]
Yun Seok Chang. 2006. A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers. The KIPS Transactions:PartA, 13, 7, (2006), 607-614. DOI: 10.3745/KIPSTA.2006.13.7.607.