Kubernetes defines a set of building blocks ("primitives") that collectively provide mechanisms that deploy, maintain, and scale applications based on
CPU, memory or custom metrics. Kubernetes is
loosely coupled and extensible to meet the needs of different workloads. The internal components as well as extensions and containers that run on Kubernetes rely on the Kubernetes API. The platform exerts its control over compute and storage resources by defining resources as objects, which can then be managed as such. Kubernetes follows the
primary/replica architecture. The components of Kubernetes can be divided into those that manage an individual
node and those that are part of the control plane.
Control plane The Kubernetes master node handles the Kubernetes control plane of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components such as
TLS encryption,
RBAC, and a strong
authentication method, network separation, each its own process, that can run both on a single master node or on multiple masters supporting
high-availability clusters.
Etcd Etcd is a persistent, lightweight, distributed,
key-value data store (originally developed as part of CoreOS). It reliably stores the configuration data of the cluster, representing the overall state of the cluster at any given point of time. Etcd favors consistency over availability in the event of a network partition (see
CAP theorem). The consistency is crucial for correctly scheduling and operating services.
API server The
API server serves the Kubernetes
API using
JSON over
HTTP, which provides both the internal and external interface to Kubernetes. The API server processes, validates
REST requests, and updates the state of the API objects in etcd, thereby allowing clients to configure workloads and containers across worker nodes. The API server uses etcd's watch API to monitor the cluster, roll out critical configuration changes, or restore any divergences of the state of the cluster back to the desired state as declared in etcd. As an example, a human operator may specify that three instances of a particular "pod" (see below) need to be running, and etcd stores this fact. If the Deployment controller finds that only two instances are running (conflicting with the etcd declaration), it schedules the creation of an additional instance of that pod. Kubernetes allows running multiple schedulers within a single cluster. As such, scheduler
plug-ins may be developed and installed as in-process extensions to the native vanilla scheduler by running it as a separate scheduler, as long as they conform to the Kubernetes scheduling framework. This allows cluster administrators to extend or modify the behavior of the default Kubernetes scheduler according to their needs.
Controllers A
controller is a reconciliation loop that drives the actual cluster state toward the desired state, communicating with the API server to create, update, and delete the resources it manages (e.g., pods or service endpoints). Labels selectors often form part of the controller's definition that specify the set of pods that a controller manages. kubelet monitors the state of a pod, and if not in the desired state, the pod re-deploys to the same node. Node status is relayed every few seconds via heartbeat messages to the API server. Once the control plane detects a node failure, a higher-level controller is expected to observe this state change and launch pods on another healthy node.
Container runtime A
container runtime is responsible for the lifecycle of containers, including launching, reconciling and killing of containers. kubelet interacts with container runtimes via the Container Runtime Interface (CRI), which
decouples the maintenance of core Kubernetes from the actual CRI implementation. Originally, kubelet interfaced exclusively with the
Docker runtime through a "dockershim". However, from November 2020 up to April 2022, Kubernetes has deprecated the
shim in favor of directly interfacing with the container through containerd, or replacing Docker with a runtime that is compliant with the Container Runtime Interface (CRI). With the release of v1.24 in May 2022, the "dockershim" has been removed entirely. Examples of popular container runtimes that are compatible with kubelet include
containerd (initially supported via Docker) and
CRI-O.
kube-proxy kube-proxy is an implementation of a
network proxy and a
load balancer, and it supports the service abstraction along with the other networking operations. They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.
Pods The basic scheduling unit in Kubernetes is a
pod, which consists of one or more containers that are guaranteed to be co-located on the same node. Within the pod, all containers can reference each other. A
container resides inside a pod. The container is the lowest level of a micro-service, which holds the running application, libraries, and their dependencies.
Workloads Kubernetes supports several abstractions of workloads that are at a higher level over simple pods. This allows users to declaratively define and manage these high-level abstractions, instead of having to manage individual pods by themselves. Several of these abstractions, supported by a standard installation of Kubernetes, are described below.
ReplicaSets, ReplicationControllers and Deployments A
ReplicaSet's purpose is to maintain a stable set of replica pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods. The ReplicaSet can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a ReplicaSet uses a selector, whose evaluation will result in identifying all pods that are associated with it. A
ReplicationController, similar to a ReplicaSet, serves the same purpose and behaves similarly to a ReplicaSet, which is to ensure that there will always be a specified number of pod replicas as desired. The ReplicationController workload was the predecessor of a ReplicaSet, but was eventually deprecated in favor of ReplicaSet to make use of set-based label selectors. While scaling stateless applications is only a matter of adding more running pods, doing so for stateful workloads is harder, because the state needs to be preserved if a pod is restarted. If the application is scaled up or down, the state may need to be redistributed. Databases are an example of stateful workloads. When run in high-availability mode, many databases come with the notion of a primary instance and secondary instances. In this case, the notion of ordering of instances is important. Other applications like
Apache Kafka distribute the data amongst their brokers; hence, one broker is not the same as another. In this case, the notion of instance uniqueness is important.
DaemonSets DaemonSets are responsible for ensuring that a pod is created on every single node in the cluster. Generally, most workloads scale in response to a desired replica count, depending on the availability and performance requirements as needed by the application. However, in other scenarios it may be necessary to deploy a pod to every single node in the cluster, scaling up the number of total pods as nodes are added and
garbage collecting them as they are removed. This is particularly helpful for use cases where the workload has some dependency on the actual node or host machine, such as log collection, ingress controllers, and storage services.
Services A Kubernetes
service is a set of pods that work together, such as one tier of a
multi-tier application. The set of pods that constitute a service are defined by a label selector. Service discovery assigns a stable IP address and
DNS name to the service, and load balances traffic in a
round-robin manner to network connections of that IP address among the pods matching the selector (even as failures cause the pods to move from machine to machine).
Volumes Filesystems in the Kubernetes container provide
ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes
volume provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the file system tree by different containers.
ConfigMaps and Secrets A common application challenge is deciding where to store and manage configuration information, some of which may contain sensitive data. Configuration data can be anything as fine-grained as individual properties, or coarse-grained information like entire configuration files such as
JSON or
XML documents. Kubernetes provides two closely related mechanisms to deal with this need, known as
ConfigMaps and
Secrets, both of which allow for configuration changes to be made without requiring an application rebuild. The data from ConfigMaps and Secrets will be made available to every single instance of the application to which these objects have been bound via the Deployment. A Secret and/or a ConfigMap is sent to a node only if a pod on that node requires it, which will only be stored in memory on the node. Once the pod that depends on the Secret or ConfigMap is deleted, the in-memory copy of all bound Secrets and ConfigMaps are deleted as well. The data from a ConfigMap or Secret is accessible to the pod through one of the following ways: • As
environment variables, which will be consumed by kubelet from the ConfigMap when the container is launched; • Mounted within a volume accessible within the container's filesystem, which supports automatic reloading without restarting the container. The biggest difference between a Secret and a ConfigMap is that Secrets are specifically designed for containing secure and confidential data, although they are not
encrypted at rest by default, and requires additional setup in order to fully secure the use of Secrets within the cluster. Secrets are often used to store confidential or sensitive data like certificates, credentials to work with image registries, passwords, and
ssh keys.
Labels and selectors Kubernetes enables clients (users or internal components) to attach keys called
labels to any API object in the system, such as pods and
nodes. Correspondingly,
label selectors are queries against labels that resolve to matching objects. tier=backend AND release_track=canary Just like labels, field selectors also let one select Kubernetes resources. Unlike labels, the selection is based on the attribute values inherent to the resource being selected, rather than user-defined categorization. metadata.name and metadata.namespace are field selectors that will be present on all Kubernetes objects. Other selectors that can be used depend on the object/resource type.
Add-ons Add-ons are additional features of the Kubernetes cluster implemented as applications running within it. The pods may be managed by Deployments, ReplicationControllers, and so on. There are many add-ons. Some of the more important are: ; DNS : Cluster DNS is a DNS server, in addition to the other DNS server(s) in the environment, which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches. ; Web UI : This is a general purpose, web-based UI for Kubernetes clusters. It allows administrators to manage and troubleshoot applications running in the cluster, as well as the cluster itself. ; Resource monitoring : Container Resource Monitoring records metrics about containers in a central database, and provides a UI for browsing that data. ; Cost monitoring : Kubernetes cost monitoring applications allow breakdown of costs by pods, nodes, namespaces, and labels. ; Cluster-level logging : To prevent the loss of event data in the event of node or pod failures, container logs can be saved to a central log store with a search/browsing interface. Kubernetes provides no native storage for log data, but one can integrate many existing logging solutions into the Kubernetes cluster.
Storage Containers emerged as a way to make software portable. The container contains all the packages needed to run a service. The provided file system makes containers extremely portable and easy to use in development. A container can be moved from development to test or production with no or relatively few configuration changes. Historically Kubernetes was suitable only for stateless services. However, many applications have a database, which requires persistence, leading to the creation of persistent storage for Kubernetes. Implementing persistent storage for containers is one of the top challenges of Kubernetes administrators, DevOps and cloud engineers. Containers may be ephemeral, but more and more of their data is not, so one needs to ensure the data's survival in case of container termination or hardware failure. When deploying containers with Kubernetes or containerized applications, organizations often realize that they need persistent storage. They need to provide fast and reliable storage for databases, root images and other data used by the containers. In addition to the landscape, the
Cloud Native Computing Foundation (CNCF), has published other information about Kubernetes Persistent Storage including a blog helping to define the container attached storage pattern. This pattern can be thought of as one that uses Kubernetes itself as a component of the storage system or service. More information about the relative popularity of these and other approaches can be found on the CNCF's landscape survey as well, which showed that OpenEBS a Stateful Persistent Storage platform from Datacore Software, and Rook a storage orchestration project were the two projects most likely to be in evaluation as of the Fall of 2019. Container Attached Storage is a type of data storage that emerged as Kubernetes gained prominence. The Container Attached Storage approach or pattern relies on Kubernetes itself for certain capabilities while delivering primarily block, file, object and interfaces to workloads running on Kubernetes. Common attributes of Container Attached Storage include the use of extensions to Kubernetes, such as custom resource definitions, and the use of Kubernetes itself for functions that otherwise would be separately developed and deployed for storage or data management. Examples of functionality delivered by custom resource definitions or by Kubernetes itself include retry logic, delivered by Kubernetes itself, and the creation and maintenance of an inventory of available storage media and volumes, typically delivered via a custom resource definition.
Container Storage Interface (CSI) In Kubernetes version 1.9, the initial Alpha release of Container Storage Interface (CSI) was introduced. Previously, storage volume plug-ins were included in the Kubernetes distribution. By creating a standardized CSI, the code required to interface with external storage systems was separated from the core Kubernetes code base. Just one year later, the CSI feature was made Generally Available (GA) in Kubernetes. == API ==