K8s Pod Creation Flow

From deployment to running pod. Explore what happens behind the scenes.

kubectl User
API Server
etcd
Deployment Ctrl
ReplicaSet Ctrl
Scheduler
Worker Node Kubelet
Current Step
User Submits Manifest
Step 1 / 9
Step 1

User Submits Manifest

What Happens

You run `kubectl apply -f deployment.yaml`. kubectl parses the YAML and converts it to JSON.

Why It Matters

The CLI is the primary interface. It prepares the request before sending it to the cluster.

Technical Detail
kubectl performs client-side validation, calculates a "patch" (3-way merge), and sends a POST/PATCH request to the API Server.

Example

apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 3 template: spec: containers: - name: nginx image: nginx:1.14.2

Key Takeaways

Declarative

You tell Kubernetes what you want (a Pod), and the Control Plane works to make it happen.

The Scheduler

The unsung hero. It filters millions of possibilities to find the one perfect Node for your Pod.

Ephemeral

Pods are mortal. They are born, they run, and they die. They are never "restarted" - they are replaced.

The Engineering of Kubernetes: The Choreography of State

Kubernetes is not merely an engine for running containers; it is fundamentally a deeply robust, distributed state machine. The entire architecture is built around a single paradigm: Level-Triggered Declarative State. You declare what the world should look like, and independent, asynchronous control loops work relentlessly to make the physical reality match your declaration.


Part 1: The API Server and etcd

The kube-apiserver is the absolute center of the Kubernetes universe. It is a stateless REST API that serves as the only component allowed to communicate directly with etcd, the highly-available, distributed key-value store acting as the cluster's permanent memory.

When you run kubectl apply -f pod.yaml, the API server executes a strict defensive pipeline:

  • Authentication: Specifically verifies the cryptographic signature of your TLS certificate or bearer token.
  • Authorization (RBAC): Checks if your specific role (e.g., "Developer") has the "create" verb permission for the "pods" resource in the target namespace.
  • Mutating Admission: Webhooks intercept the payload and modify it on the fly (e.g., Istio automatically injecting an Envoy proxy sidecar container into your Pod spec).
  • Validating Admission: Webhooks perform final semantic checks (e.g., rejecting the pod if it doesn't specify CPU limits as required by security policy).

Only if all checks pass does the API server serialize the object into Protobuf format and commit it to etcd. At this exact millisecond, the Pod legally "exists" in the cluster's Desired State, even though no physical container is running yet.

Part 2: The Scheduler's Mathematical Filter

The kube-scheduler continuously watches the API Server for any Pod that has an empty spec.nodeName field. Its job is incredibly specific: pick the mathematical best-fit physical node for this pending Pod. It uses a bipartite algorithm:

  1. Filtering (Hard Constraints): It eliminates nodes that physically cannot run the Pod. Does the node lack sufficient CPU/RAM? Does it have a taint that the Pod doesn't tolerate? Is the node out of disk space? If a cluster has 1000 nodes, filtering might quickly reduce the eligible candidates to 50.
  2. Scoring (Soft Constraints): It ranks the surviving 50 nodes. It assigns higher scores to nodes that already have the required Docker image cached locally, or nodes that spread the Pod out across different physical availability zones (Anti-Affinity) to maximize fault tolerance.

The Scheduler selects the highest-scoring Node and issues a "Binding" POST request to the API server, officially updating the Pod's nodeName.

Part 3: The Kubelet and the Container Runtime

Every physical worker node runs an agent called the Kubelet. It continuously watches the API server, filtering strictly for Pods assigned to its own nodeName.

When the Kubelet sees its new assignment, it acts as the local orchestrator:

  • It tells the CRI (Container Runtime Interface)—like containerd or CRI-O—to pull the Docker images from the external registry.
  • It commands the CNI (Container Network Interface) plugin (like Calico or Cilium) to wire up a Virtual Ethernet (veth) pair, connecting the Pod's isolated network namespace to the host's root network, and assigns the Pod a globally unique IP address.
  • It instructs the CRI to physically start the Linux cgroups and namespaces, bringing the application process to life.

The Kubelet then updates the API server: "Status: Running."

Part 4: The Reconciliation Loop

Crucially, the Kubelet's job does not end when the container starts. It runs continuous Liveness and Readiness Probes.

If your Node physically loses power and burns down, its Kubelet stops sending "Heartbeat" updates to the API Server. The kube-controller-manager notices this silence and marks the Node as "NotReady". After a 5-minute grace period, it brutally evicts all Pods assigned to the dead node.

If those Pods were managed by a Deployment, the ReplicaSet controller detects that the Current State (2 Pods) no longer mathematically equals the Desired State (3 Pods). Without any human intervention, the ReplicaSet controller instantly commands the API Server to create a brand new replacement Pod. The Scheduler sees the new pending Pod, binds it to a healthy Node, the new Kubelet spins it up, and the world is instantly healed. This is the chaotic, beautiful resilience of Kubernetes.

Glossary & Concepts

API Server

The central brain. Authenticates requests, validates data, and updates etcd. The only component that talks to etcd directly.

etcd

Consistent, distributed (RAVEN/RAFT) key-value store. The "source of truth" for the cluster state.

Kube-Scheduler

Watchdog for unscheduled pods. Assigns a Node to a Pod based on resource moves, taints, and affinity rules.

Kubelet

The primary "agent" running on every node. Registers the node looking for Pods and executes containers (via CRI).

CRI (Container Runtime Interface)

Standard API allowing Kubelet to use different runtimes (containerd, CRI-O, Docker Engine) interchangeably.

CNI (Container Network Interface)

Plugin architecture (Calico, Flannel, Cilium) that configures network interfaces and IPs for Pods.