Checkpoint/Restore
Summary
Add support to checkpoint and restore processes. Checkpointing processes can be used for fault tolerance and/or load balancing.
Checkpointing a process in regular intervals can help to restart a process if it might crash to resume/restart/restore the calculation without too much data lost. Providing this ability transparent at the OS level removes the need to implement this functionality for all processes manually.
Checkpointing and restoring a process to another system can be used to migrate a process, process tree or container to another system to distribute the load during the runtime and also for maintenance without service interruption like it is possible with virtual machines.
Owner
- Name: Adrian Reber
- Email: <adrian@lisas.de>
Current status
- Targeted release: Fedora 19
- Last updated: 2012-10-24
- Percentage of completion: 0%
Detailed Description
Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.
Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). CRIU has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.2) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.
To offer the checkpoint/restore functionality the package crtools has to be imported into Fedora and following changes are necessary to the kernel RPM:
diff --git a/config-x86_64-generic b/config-x86_64-generic index 342b862..c5f8cf9 100644 --- a/config-x86_64-generic +++ b/config-x86_64-generic @@ -1,5 +1,8 @@ CONFIG_64BIT=y +CONFIG_EXPERT=y +CONFIG_CHECKPOINT_RESTORE=y +CONFIG_NAMESPACES=y # CONFIG_X86_X32 is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set
Benefit to Fedora
Fedora offers the possibility to checkpoint/restore processes.
Scope
- add the crtools package to Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=869618
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)
How To Test
A process should be able to be dumped with following command:
crtools dump -D <destination-directory> -t <PID>
and restored with following command:
crtools restore -D <destination-directory> -t <PID>
User Experience
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>
Dependencies
- add the crtools package to Fedora
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)
Contingency Plan
- disable kernel options
Documentation
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>