Checkpoint/Restore
Summary
Add support to checkpoint and restore processes. Checkpointing processes can be used for fault tolerance and/or load balancing.
Checkpointing a process in regular intervals can help to restart a process if it might crash to resume/restart/restore the calculation without too much data lost. Providing this ability transparent at the OS level removes the need to implement this functionality for all processes manually.
Checkpointing and restoring a process to another system can be used to migrate a process, process tree or container to another system to distribute the load during the runtime and also for maintenance without service interruption like it is possible with virtual machines.
Owner
- Name: Adrian Reber
- Email: <adrian@lisas.de>
Current status
- Targeted release: Fedora 19
- Last updated: 2013-01-22
- Percentage of completion: 100%
Detailed Description
Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.
Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). CRIU has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.3) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.
To offer the checkpoint/restore functionality the package crtools has been imported into Fedora and following options have been enabled in the kernel RPM (as of 2013-01-30):
diff --git a/config-x86_64-generic b/config-x86_64-generic index 342b862..c5f8cf9 100644 --- a/config-x86_64-generic +++ b/config-x86_64-generic @@ -1,5 +1,8 @@ CONFIG_64BIT=y +CONFIG_EXPERT=y +CONFIG_CHECKPOINT_RESTORE=y +CONFIG_NAMESPACES=y # CONFIG_X86_X32 is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set
Benefit to Fedora
Fedora offers the possibility to checkpoint/restore processes.
Scope
- add the crtools package to Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=869618 (done)
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE) (done)
How To Test
A process should be able to be dumped with following command:
crtools dump -D <destination-directory> -t <PID>
and restored with following command:
crtools restore -D <destination-directory> -t <PID>
User Experience
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>
Dependencies
- add the crtools package to Fedora (done)
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE), if the kernel maintainers agree to enable those options (done)
Contingency Plan
- disable kernel options
Documentation
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>
Release Notes
- The CRIU (Checkpoint/Restore in User-space) projects offers a user-space implementation of process and process group checkpoint/restore. With the user-space tools crtools included in this release it is possible checkpoint processes and restore them at a later time again (e.g. after a crash) or migrate the checkpointed process or process group to another system. CRIU aims to be as transparent as possible so that no instrumentation or re-compilation of the process to be checkpointed is necessary.