Know More

Birth of a Kubernetes Pod !
October 2nd 2024

Hi , today we will discuss about the magic behind pod creation with a pitch into k8s architecture by dissecting its components.

Going back to topic of the day, …

As container technology continues to evolve and power cloud applications, managing them efficiently for security, performance, agility, and reliability remains a challenge. Kubernetes (K8s), an open-source container orchestration platform, automates the deployment, scaling, and management of containerized applications. A Kubernetes cluster consists of multiple nodes that communicate for health management, with the master node serving as the control plane and worker nodes handling application deployment. The Kubernetes cluster has two sets of nodes — one referred to as the control plane and the other as the worker plane. All of the management functions of the cluster are handled by the control plane. All core control plane components typically run on the same machine, although they can be distributed across multiple nodes in high-availability setups. By default, control plane nodes are tainted to prevent them from running user workloads.

Now, when the end-user — that’s me – Vidya! — requests the creation of a pod via a kubectl command, the kube-apiserver in the control plane immediately receives and processes the request. The API server performs authentication (verifying who I am) and authorization (checking if I’m allowed to create a pod in this cluster). If both checks pass, the request is persisted into etcd.

etcd is a distributed key-value store that holds the full configuration and state data of the Kubernetes cluster. It maintains the desired state (what the user wants) and the actual state (what is currently running). The entire functional model of Kubernetes is based on continuously reconciling the actual state to the desired state. etcd stores both, making it the source of truth for the cluster through declarative approach.

Going back to where we left off — after the pod definition is written to etcd, the API server responds with a success message. However, the pod is not yet running — it only exists as a record in the cluster. So, at this moment, the actual state is “Vidya’s pod not created,” while the desired state is yet to create one. The next critical component is the kube-scheduler. It constantly watches the API server (using informers and event-based mechanisms) for any newly created pods that are not yet assigned to a node. Once such a pod is detected, it evaluates the available worker nodes based on various parameters from the pod spec — like resource requirements, affinity/anti-affinity rules, tolerations, and taints.

Once the most suitable worker node (say, w1) is selected, the scheduler updates the pod’s specification in the API server by assigning the nodeName. This binding action is what we call scheduling. As the API server updates the pod definition in etcd, the kubelet on node w1 is notified that there is a pod assigned to it.

The kubelet on the selected node now takes over. It fetches the pod specification, pulls the required container images (via the Container Runtime Interface), and launches the pod on the node, ensuring volume mounts, security settings, and runtime configurations are respected.

The journey of the pod’s life thus begins. But to continuously maintain the desired vs actual state of all resources (including pods), we have another control plane component called the Controller Manager (CM). This component runs all major controllers — like the ReplicaSet controller, Deployment controller, DaemonSet controller, etc. — and their job is to reconcile: meaning it ensures that the number of running pods, or any other resource, matches the desired state.

For example, if the pod running on worker node w1 accidentally gets deleted or crashes, the ReplicaSet controller detects this and immediately creates a replacement pod to meet the desired replica count. This process is called reconciliation. While replicaSet is built-in or native controller, many custom controllers can be created by the sytem-administrator of the cluster as well.

Last but not least, one vital component on the worker node is the kube-proxy. While it doesn’t participate in the pod creation flow, it is essential for pod networking. kube-proxy handles service discovery and ensures that pods can communicate with each other or with external services — using iptables rules under the hood.

Cloud Native Security White Paper
November 1st 2024