(Moved to FeatureReadyForFesco for approval (#1007)) |
(Feature accepted on Jan 30 FESCo meeting (#1007)) |
||
Line 89: | Line 89: | ||
* See [[Talk:Features/Checkpoint_Restore]] | * See [[Talk:Features/Checkpoint_Restore]] | ||
[[Category: | [[Category:FeatureAcceptedF19]] | ||
<!-- When your feature page is completed and ready for review --> | <!-- When your feature page is completed and ready for review --> | ||
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler --> | <!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler --> | ||
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete--> | <!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete--> | ||
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process --> | <!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process --> |
Revision as of 19:26, 30 January 2013
Checkpoint/Restore
Summary
Add support to checkpoint and restore processes. Checkpointing processes can be used for fault tolerance and/or load balancing.
Checkpointing a process in regular intervals can help to restart a process if it might crash to resume/restart/restore the calculation without too much data lost. Providing this ability transparent at the OS level removes the need to implement this functionality for all processes manually.
Checkpointing and restoring a process to another system can be used to migrate a process, process tree or container to another system to distribute the load during the runtime and also for maintenance without service interruption like it is possible with virtual machines.
Owner
- Name: Adrian Reber
- Email: <adrian@lisas.de>
Current status
- Targeted release: Fedora 19
- Last updated: 2013-01-22
- Percentage of completion: 50%
Detailed Description
Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.
Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). CRIU has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.3) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.
To offer the checkpoint/restore functionality the package crtools has been imported into Fedora and following changes are still necessary to the kernel RPM:
diff --git a/config-x86_64-generic b/config-x86_64-generic index 342b862..c5f8cf9 100644 --- a/config-x86_64-generic +++ b/config-x86_64-generic @@ -1,5 +1,8 @@ CONFIG_64BIT=y +CONFIG_EXPERT=y +CONFIG_CHECKPOINT_RESTORE=y +CONFIG_NAMESPACES=y # CONFIG_X86_X32 is not set # CONFIG_MK8 is not set # CONFIG_MPSC is not set
Benefit to Fedora
Fedora offers the possibility to checkpoint/restore processes.
Scope
- add the crtools package to Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=869618 (done)
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)
How To Test
A process should be able to be dumped with following command:
crtools dump -D <destination-directory> -t <PID>
and restored with following command:
crtools restore -D <destination-directory> -t <PID>
User Experience
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>
Dependencies
- add the crtools package to Fedora (done)
- activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE), if the kernel maintainers agree to enable those options
Contingency Plan
- disable kernel options
Documentation
Users can easily checkpoint and restore processes with the crtools package:
crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>
Release Notes
- The CRIU (Checkpoint/Restore in User-space) projects offers a user-space implementation of process and process group checkpoint/restore. With the user-space tools crtools included in this release it is possible checkpoint processes and restore them at a later time again (e.g. after a crash) or migrate the checkpointed process or process group to another system. CRIU aims to be as transparent as possible so that no instrumentation or re-compilation of the process to be checkpointed is necessary.