DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop

Computer Science – Distributed – Parallel – and Cluster Computing

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

17 pages; 2 figures, 8 plots, and 2 tables; description of DMTCP; Version 3: describing checkpointing both for distributed mul

Scientific paper

DMTCP (Distributed MultiThreaded CheckPointing) is a transparent user-level checkpointing package for distributed applications. Checkpointing and restart is demonstrated for a wide range of over 20 well known applications, including MATLAB, Python, TightVNC, MPICH2, OpenMPI, and runCMS. RunCMS runs as a 680 MB image in memory that includes 540 dynamic libraries, and is used for the CMS experiment of the Large Hadron Collider at CERN. DMTCP transparently checkpoints general cluster computations consisting of many nodes, processes, and threads; as well as typical desktop applications. On 128 distributed cores (32 nodes), checkpoint and restart times are typically 2 seconds, with negligible run-time overhead. Typical checkpoint times are reduced to 0.2 seconds when using forked checkpointing. Experimental results show that checkpoint time remains nearly constant as the number of nodes increases on a medium-size cluster. DMTCP automatically accounts for fork, exec, ssh, mutexes/semaphores, TCP/IP sockets, UNIX domain sockets, pipes, ptys (pseudo-terminals), terminal modes, ownership of controlling terminals, signal handlers, open file descriptors, shared open file descriptors, I/O (including the readline library), shared memory (via mmap), parent-child process relationships, pid virtualization, and other operating system artifacts. By emphasizing an unprivileged, user-space approach, compatibility is maintained across Linux kernels from 2.6.9 through the current 2.6.28. Since DMTCP is unprivileged and does not require special kernel modules or kernel patches, DMTCP can be incorporated and distributed as a checkpoint-restart module within some larger package.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-305527

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.