Cgroups

cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

Versions

There are two versions of cgroups. They can co-exist in a system. • The original version of cgroups was written by Paul Menage and Rohit Seth. It was merged into the mainline Linux kernel in 2007 (2.6.2). Development and maintenance of cgroups was then taken over by Tejun Heo, who instituted major redesigns without breaking the interface (see ). It was renamed "Control Group version 1" (cgroup-v1) after cgroups-v2 appeared in Linux 4.5. • Tejun Heo found that further redesign of v1 could not proceed without breaking the interface. As a result, he added a separate, new system called "Control Group version 2" (cgroup-v2). Unlike v1, cgroup v2 has only a single process hierarchy (because a controller can only be assigned to one hierarchy, processes in separate hierarchies cannot be managed by the same controller; this change sidesteps the issue). It also removes the ability to discriminate between threads, choosing to work on a granularity of processes instead (disabling an "abuse" of the system which led to convoluted APIs). == Features ==

Features

One of the design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes (by using nice, for example) to full operating system-level virtualization (as provided by OpenVZ, Linux-VServer or LXC, for example). Cgroups provides: ; Resource limiting : groups can be set not to exceed a configured memory limit, which also includes the file system cache, I/O bandwidth limit, CPU quota limit, CPU set limit, or maximum open files. ; Prioritization : some groups may get a larger share of CPU utilization or disk I/O throughput ; Accounting : measures a group's resource usage, which may be used, for example, for billing purposes ; Control : freezing groups of processes, their checkpointing and restarting == Use ==

{{Anchor|USAGE}}Use

A control group (abbreviated as cgroup) is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical, meaning that each group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface; libvirt, systemd, Open Grid Scheduler/Grid Engine, and Google's developmentally defunct lmctfy. The Linux kernel documentation contains some technical details of the setup and use of control groups version 1 and version 2. Interfaces Both versions of cgroup act through a pseudo-filesystem ( for v1 and for v2). Like all filesystems they can be mounted on any path, but the general convention is to mount one of the versions (generally v2) on under the sysfs default location of . As mentioned before the two cgroup versions can be active at the same time; this too applies to the filesystems so long as they are mounted to a different path. Red Hat also provides a guide on creating a systemd service file that causes a process to run in a separate cgroup. systemd-cgtop command can be used to show top control groups by their resource usage. V1 coexistence On a system with v2, v1 can still be mounted and given access to controllers not in use by v2. However, a modern system typically already places all controllers in use in v2, so there is no controller available for v1 at all even if a hierarchy is created. It is possible to clear all uses of a controller from v2 and hand it to v1, but moving controllers between hierarchies after the system is up and running is cumbersome and not recommended. == Major evolutions ==

Major evolutions

Redesigns of v1 Redesign of cgroups started in 2013, with additional changes brought by versions 3.15 and 3.16 of the Linux kernel. The following changes concern the kernel before 4.5/4.6, i.e. when cgroups-v2 were added. In other words they describe how cgroups-v1 had been changed, though most of them have also been inherited into v2 (after all, v1 and v2 share the same codebase). Namespace isolation While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, user, UTS (Unix Time Sharing), network and SysV IPC namespaces. • The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers. • Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device. • "UTS" namespace allows changing the hostname. • Mount namespace allows creating a different file system layout, or making certain mount points read-only. • IPC namespace isolates the System V inter-process communication between namespaces. • User namespace isolates the user IDs between namespaces. • Cgroup namespace Namespaces are created with the "unshare" command or syscall, or as "new" flags in a "clone" syscall. The "ns" subsystem was added early in cgroups development to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel. Linux namespaces were inspired by the more general namespace functionality used heavily throughout Plan 9 from Bell Labs. Conversion to kernfs Kernfs was introduced into the Linux kernel with version 3.14 in March 2014, the main author being Tejun Heo. One of the main motivators for a separate kernfs is the cgroups file system. Kernfs is basically created by splitting off some of the sysfs logic into an independent entity, thus easing for other kernel subsystems the implementation of their own virtual file system with handling for device connect and disconnect, dynamic creation and removal, and other attributes. This does not affect how cgroups is used, but makes maintaining the code easier. New features introduced during v1 Kernel memory control groups ('''') were merged into version 3.8 () of the Linux kernel mainline. The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes. Support for per-group netfilter setup was added in 2014. Changes after v2 Unlike v1, cgroup v2 has only a single process hierarchy and discriminates between processes, not threads. cgroup awareness of OOM killer Linux Kernel 4.19 (October 2018) introduced cgroup awareness of OOM killer implementation which adds an ability to kill a cgroup as a single unit and so guarantee the integrity of the workload. == Adoption ==

{{Anchor|ADOPTION}}Adoption

Various projects use cgroups as their basis, including CoreOS, Docker (in 2013), Hadoop, Jelastic, Kubernetes, lmctfy (Let Me Contain That For You), LXC (Linux Containers), systemd, Mesos and Mesosphere, On 29 October 2019, the Fedora Project modified Fedora 31 to use CgroupsV2 by default. == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com