From Fedora Project Wiki
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Checkpoint/Restore
= Checkpoint/Restore =


== Summary ==
== Summary ==
Line 16: Line 16:
== Current status ==
== Current status ==
* Targeted release: [[Releases/19 | Fedora 19 ]]  
* Targeted release: [[Releases/19 | Fedora 19 ]]  
* Last updated: 2012-10-24
* Last updated: 2013-01-22
* Percentage of completion: 0%
* Percentage of completion: 100%


== Detailed Description ==
== Detailed Description ==
Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.
Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.


Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). [http://criu.org/ CRIU] has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.2) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.
Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). [http://criu.org/ CRIU] has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.3) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.


To offer the checkpoint/restore functionality the package crtools has to be imported into Fedora and following changes are necessary to the kernel RPM:
To offer the checkpoint/restore functionality the package crtools has been imported into Fedora and following options have been enabled in the kernel RPM (as of 2013-01-30):


<pre>
<pre>
Line 43: Line 43:


== Benefit to Fedora ==
== Benefit to Fedora ==
Fedora offers possibility to checkpoint/restore processes.
Fedora offers the possibility to checkpoint/restore processes.


== Scope ==
== Scope ==
* add the crtools package to Fedora
* add the crtools package to Fedora: https://bugzilla.redhat.com/show_bug.cgi?id=869618 (done)
* activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)
* activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE) (done)


== How To Test ==
== How To Test ==
<!-- This does not need to be a full-fledged document.  Describe the dimensions of tests that this feature is expected to pass when it is done.  If it needs to be tested with different hardware or software configurations, indicate them.  The more specific you can be, the better the community testing can be.
A process should be able to be dumped with following command:


Remember that you are writing this how to for interested testers to use to check out your feature - documenting what you do for testing is OK, but it's much better to document what *I* can do to test your feature.
<pre>crtools dump -D <destination-directory> -t <PID></pre>


A good "how to test" should answer these four questions:
and restored with following command:


0. What special hardware / data / etc. is needed (if any)?
<pre>crtools restore -D <destination-directory> -t <PID></pre>
1. How do I prepare my system to test this feature? What packages
need to be installed, config files edited, etc.?
2. What specific actions do I perform to check that the feature is
working like it's supposed to?
3. What are the expected results of those actions?
-->


== User Experience ==
== User Experience ==
Line 72: Line 66:


== Dependencies ==
== Dependencies ==
* add the crtools package to Fedora
* add the crtools package to Fedora (done)
* activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE)
* activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE), if the kernel maintainers agree to enable those options (done)


== Contingency Plan ==
== Contingency Plan ==
Line 79: Line 73:


== Documentation ==
== Documentation ==
<!-- Is there upstream documentation on this feature, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
Users can easily checkpoint and restore processes with the crtools package:
*
 
<pre>crtools dump -D <destination-directory> -t <PID></pre>
 
<pre>crtools restore -D <destination-directory> -t <PID></pre>
 
http://criu.org/


== Release Notes ==
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The Fedora Release Notes inform end-users about what is new in the release.  Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
*
* The CRIU (Checkpoint/Restore in User-space) projects offers a user-space implementation of process and process group checkpoint/restore. With the user-space tools crtools included in this release it is possible checkpoint processes and restore them at a later time again (e.g. after a crash) or migrate the checkpointed process or process group to another system. CRIU aims to be as transparent as possible so that no instrumentation or re-compilation of the process to be checkpointed is necessary.


== Comments and Discussion ==
== Comments and Discussion ==
* See [[Talk:Features/Checkpoint_Restore]]
* See [[Talk:Features/Checkpoint_Restore]]


 
[[Category:FeatureAcceptedF19]]
[[Category:FeaturePageIncomplete]]
<!-- When your feature page is completed and ready for review -->
<!-- When your feature page is completed and ready for review -->
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
<!-- remove Category:FeaturePageIncomplete and change it to Category:FeatureReadyForWrangler -->
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
<!-- After review, the feature wrangler will move your page to Category:FeatureReadyForFesco... if it still needs more work it will move back to Category:FeaturePageIncomplete-->
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->
<!-- A pretty picture of the page category usage is at: https://fedoraproject.org/wiki/Features/Policy/Process -->

Latest revision as of 09:48, 5 February 2013

Checkpoint/Restore

Summary

Add support to checkpoint and restore processes. Checkpointing processes can be used for fault tolerance and/or load balancing.

Checkpointing a process in regular intervals can help to restart a process if it might crash to resume/restart/restore the calculation without too much data lost. Providing this ability transparent at the OS level removes the need to implement this functionality for all processes manually.

Checkpointing and restoring a process to another system can be used to migrate a process, process tree or container to another system to distribute the load during the runtime and also for maintenance without service interruption like it is possible with virtual machines.

Owner

  • Email: <adrian@lisas.de>

Current status

  • Targeted release: Fedora 19
  • Last updated: 2013-01-22
  • Percentage of completion: 100%

Detailed Description

Checkpointing/restore, as mentioned above, can be used for fault tolerance and load distribution.

Fedora can offer checkpoint/restore by using CRIU (Checkpoint/Restore In Userspace). CRIU has been developed with the goal to be accepted by upstream and most patches necessary have already been accepted (as of 2012-10-24) in the kernel. The current release (0.3) of the userspace tools (crtools) offers the ability to checkpoint/restore containers and thus offering the ability to migrate containers.

To offer the checkpoint/restore functionality the package crtools has been imported into Fedora and following options have been enabled in the kernel RPM (as of 2013-01-30):

diff --git a/config-x86_64-generic b/config-x86_64-generic
index 342b862..c5f8cf9 100644
--- a/config-x86_64-generic
+++ b/config-x86_64-generic
@@ -1,5 +1,8 @@
 CONFIG_64BIT=y
 
+CONFIG_EXPERT=y
+CONFIG_CHECKPOINT_RESTORE=y
+CONFIG_NAMESPACES=y
 # CONFIG_X86_X32 is not set
 # CONFIG_MK8 is not set
 # CONFIG_MPSC is not set

Benefit to Fedora

Fedora offers the possibility to checkpoint/restore processes.

Scope

How To Test

A process should be able to be dumped with following command:

crtools dump -D <destination-directory> -t <PID>

and restored with following command:

crtools restore -D <destination-directory> -t <PID>

User Experience

Users can easily checkpoint and restore processes with the crtools package:

crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>

Dependencies

  • add the crtools package to Fedora (done)
  • activate the three kernel options mentioned above (CONFIG_EXPERT, CONFIG_NAMESPACES, CONFIG_CHECKPOINT_RESTORE), if the kernel maintainers agree to enable those options (done)

Contingency Plan

  • disable kernel options

Documentation

Users can easily checkpoint and restore processes with the crtools package:

crtools dump -D <destination-directory> -t <PID>
crtools restore -D <destination-directory> -t <PID>

http://criu.org/

Release Notes

  • The CRIU (Checkpoint/Restore in User-space) projects offers a user-space implementation of process and process group checkpoint/restore. With the user-space tools crtools included in this release it is possible checkpoint processes and restore them at a later time again (e.g. after a crash) or migrate the checkpointed process or process group to another system. CRIU aims to be as transparent as possible so that no instrumentation or re-compilation of the process to be checkpointed is necessary.

Comments and Discussion