m (Features/ResourceMgt moved to Features/ControlGroupsKernel: Better naming and more relevant to the current upstream project name.) |
(→Scope) |
||
Line 65: | Line 65: | ||
There are several sub-features under control group: | There are several sub-features under control group: | ||
* CGROUPS (grouping mechanism) | * CGROUPS (grouping infrastructure mechanism) | ||
CGROUPS=y | CGROUPS=y | ||
* CPUSET (cpuset controller) | * CPUSET (cpuset controller, in F10) | ||
CPUSET=y | CPUSET=y | ||
* CPUACCT (cpu account controller) | * CPUACCT (cpu account controller, in F10) | ||
CGROUP_CPUACCT=y | CGROUP_CPUACCT=y | ||
* SCHED (schedule controller) | * SCHED (schedule controller, in F10) | ||
CGROUP_SCHED=y | CGROUP_SCHED=y | ||
* MEMCTL (memory controller) | * MEMCTL (memory controller, in F10) | ||
RESOURCE_COUNTERS=y | RESOURCE_COUNTERS=y | ||
CGROUP_MEM_CONT=y | CGROUP_MEM_CONT=y | ||
Line 85: | Line 85: | ||
CGROUP_DEVICE=y | CGROUP_DEVICE=y | ||
* NETCTL (network controller) | * NETCTL (network controller, New) | ||
NET_CLS_CGROUP=y | NET_CLS_CGROUP=y | ||
Revision as of 08:15, 12 February 2009
Control Groups
Summary
Control Groups
is an upstream feature that allows system resources to be partitioned/divided up amongst different processes, or a group of processes.
Owner
- Name: lwang
- email: lwang@redhat.com
Current status
- Targeted release: Fedora 11
- Last updated: Jan 29, 2009
- Percentage of completion: 75%
Detailed Description
Resource Management/Control Groups
Control Groups provide a mechanism for aggregating/partitioning sets of tasks, and all their future children, into hierarchical groups with specialized behaviour.
Definitions:
A *cgroup* associates a set of tasks with a set of parameters for one or more subsystems.
A *subsystem* is a module that makes use of the task grouping facilities provided by cgroups to treat groups of tasks in particular ways. A subsystem is typically a "resource controller" that schedules a resource or applies per-cgroup limits, but it may be anything that wants to act on a group of processes, e.g. a virtualization subsystem.
A *hierarchy* is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy, and a set of subsystems; each subsystem has system-specific state attached to each cgroup in the hierarchy. Each hierarchy has an instance of the cgroup virtual filesystem associated with it.
At any one time there may be multiple active hierachies of task cgroups. Each hierarchy is a partition of all tasks in the system.
User level code may create and destroy cgroups by name in an instance of the cgroup virtual file system, specify and query to which cgroup a task is assigned, and list the task pids assigned to a cgroup. Those creations and assignments only affect the hierarchy associated with that instance of the cgroup file system.
On their own, the only use for cgroups is for simple job tracking. The intention is that other subsystems hook into the generic cgroup support to provide new attributes for cgroups, such as accounting/limiting the resources which processes in a cgroup can access. For example, cpusets (see Documentation/cpusets.txt) allows you to associate a set of CPUs and a set of memory nodes with the tasks in each cgroup.
Benefit to Fedora
To enable the cgroup sub-features will help fedora to be exposed to various resource partitioning scheme, and allow the fedora users to experience a new feature set that helps them partition their resource anyway they want.
Scope
There are several sub-features under control group:
- CGROUPS (grouping infrastructure mechanism)
CGROUPS=y
- CPUSET (cpuset controller, in F10)
CPUSET=y
- CPUACCT (cpu account controller, in F10)
CGROUP_CPUACCT=y
- SCHED (schedule controller, in F10)
CGROUP_SCHED=y
- MEMCTL (memory controller, in F10)
RESOURCE_COUNTERS=y CGROUP_MEM_CONT=y (CGROUP_MEM_RES_CTLR???)
- DEVICE
CGROUP_DEVICE=y
- NETCTL (network controller, New)
NET_CLS_CGROUP=y
- IOCTL (I/O controller)
?? still under development
How To Test
To help test, and use the control group features in Fedora; there are multiple way to test, depends on the feature set that you are interested in.
For CPUSET:
0. targeted mostly for x86, x86_64 1. Documentation/cgroups/cpusets.txt, section 2, Usage Examples and Syntax: To start a new job that is to be contained within a cpuset, the steps are:
1) mkdir /dev/cpuset 2) mount -t cgroup -ocpuset cpuset /dev/cpuset 3) Create the new cpuset by doing mkdir's and write's (or echo's) in the /dev/cpuset virtual file system. 4) Start a task that will be the "founding father" of the new job. 5) Attach that task to the new cpuset by writing its pid to the /dev/cpuset tasks file for that cpuset. 6) fork, exec or clone the job tasks from this founding father task.
For example, the following sequence of commands will setup a cpuset named "Charlie", containing just CPUs 2 and 3, and Memory Node 1, and then start a subshell 'sh' in that cpuset:
mount -t cgroup -ocpuset cpuset /dev/cpuset cd /dev/cpuset mkdir Charlie cd Charlie /bin/echo 2-3 > cpus /bin/echo 1 > mems /bin/echo $$ > tasks sh # The subshell 'sh' is now running in cpuset Charlie # The next line should display '/Charlie' cat /proc/self/cpuset
For CPUACCT
The CPU accounting controller is used to group tasks using cgroups and account the CPU usage of these groups of tasks.
The CPU accounting controller supports multi-hierarchy groups. An accounting group accumulates the CPU usage of all of its child groups and the tasks directly present in its group.
Accounting groups can be created by first mounting the cgroup filesystem.
- mkdir /cgroups
- mount -t cgroup -ocpuacct none /cgroups
With the above step, the initial or the parent accounting group becomes visible at /cgroups. At bootup, this group includes all the tasks in the system. /cgroups/tasks lists the tasks in this cgroup. /cgroups/cpuacct.usage gives the CPU time (in nanoseconds) obtained by this group which is essentially the CPU time obtained by all the tasks in the system.
New accounting groups can be created under the parent group /cgroups.
- cd /cgroups
- mkdir g1
- echo $$ > g1
The above steps create a new group g1 and move the current shell process (bash) into it. CPU time consumed by this bash and its children can be obtained from g1/cpuacct.usage and the same is accumulated in /cgroups/cpuacct.usage also.
For Memory Controller 0. Configuration
a. Enable CONFIG_CGROUPS b. Enable CONFIG_RESOURCE_COUNTERS c. Enable CONFIG_CGROUP_MEM_RES_CTLR (still valid??)
1. Prepare the cgroups
- mkdir -p /cgroups
- mount -t cgroup none /cgroups -o memory
2. Make the new group and move bash into it
- mkdir /cgroups/0
- echo $$ > /cgroups/0/tasks
Since now we're in the 0 cgroup, We can alter the memory limit:
- echo 4M > /cgroups/0/memory.limit_in_bytes
NOTE: We can use a suffix (k, K, m, M, g or G) to indicate values in kilo, mega or gigabytes.
- cat /cgroups/0/memory.limit_in_bytes
4194304
NOTE: The interface has now changed to display the usage in bytes instead of pages
We can check the usage:
- cat /cgroups/0/memory.usage_in_bytes
1216512
A successful write to this file does not guarantee a successful set of this limit to the value written into the file. This can be due to a number of factors, such as rounding up to page boundaries or the total availability of memory on the system. The user is required to re-read this file after a write to guarantee the value committed by the kernel.
- echo 1 > memory.limit_in_bytes
- cat memory.limit_in_bytes
4096
The memory.failcnt field gives the number of times that the cgroup limit was exceeded.
The memory.stat file gives accounting information. Now, the number of caches, RSS and Active pages/Inactive pages are shown.
User Experience
End-user who will use this feature will hopefully find it useful to help partition their server/machine resources into different functional units that they can dedicate these resources to.
The control group user interfaces are very straight forward, and are a set of common easy to use command-line operations. The concept of allocating different system resources such as number of CPUs, amount of memories, and network bandwidth should be easy.
Dependencies
In order to get this feature fully functional on a fedora system, libcg pkg is needed so that some application can develope against it.
Other than that, majority of the implementation is done inside of the kernel, and should be fully functional via existing user interfaces.
Contingency Plan
The contingency plan for under develop sub-feature is to simply not enable the kernel option during development freeze. Hence it will not expose the incomplete sub-feature to the fedora community.
Documentation
In kernel linux directory, you will find different sub features in the control group's directory:
/Documentation/cgroup/cgroups.txt - overall top level description of the feature /Documentation/cgroup/cpusets.txt - doc describing CPU/memory nodes to a set of tasks /Documentation/cgroup/cpuacct.txt - doc describing CPU acct ctrl to cal. usage of cpu time /Documentation/cgroup/devices.txt - doc describing device file /Documentation/cgroup/memory.txt - /Documentation/cgroup/resource_counter.txt -
Release Notes
Fedora 11 includes a new feature called Control Group
where it allows system administrator to partition the system resources into different sub groups, and dedicate these sub groups resources to different applications' need. It can be use to dedicate specific applications such as interactive applications; cpu, memory, or network bandwidth intensive application; or database application to a set of pre-allocated system resources.
Comments and Discussion