Library
00/07 · ~38 min
GUIDEDECK · for running containers that stay up

Kubernetes
& the art of the
self-healing cluster.

A 38-minute working session on going from one container on your laptop to a fleet that schedules, scales, and repairs itself — Pods, Deployments, Services, config & storage, autoscaling, and the tooling that ties it together.

~38 MINBEGINNER → INTERMEDIATEHANDS-ON CONCEPTS
SCROLL
01 · Why Kubernetes 4 min

One container is easy.
A hundred, across many machines, is not.

You can docker run a single container by hand. But real systems need dozens of copies spread across many servers, restarted when they crash, replaced when a machine dies, and updated without downtime. Doing that by hand is a full-time job. Kubernetes is the robot that does it for you — all day, every day.

Kubernetes (often written K8s— "K", eight letters, "s") — an open-source system that runs your containers across a group of machines and keeps them running. You tell it what you want ("five copies of this app, always up"); it figures out where to place them, watches them, and fixes things when reality drifts from your wish.
Orchestration automating the placement, networking, scaling, and recovery of many containers. Think of a shipping yard: you don't hand-carry each container — a crane system decides which ship, which slot, and reloads anything that falls. Kubernetes is that crane system for software.

The one idea behind everything: the control loop

  • You write down a desired state("3 replicas of v2") in a YAML file and hand it to the cluster.
  • Kubernetes constantly compares desired vs the actual state of the cluster.
  • When they differ — a pod crashed, a node vanished — it reconciles: starts, stops, or moves things until reality matches your wish.
  • This is declarative: you describe the destination, not the turn-by-turn steps.
DESIRED 3 replicas · v2 ACTUAL 2 running · 1 crashed RECONCILE start the missing pod compare fix

Desired vs actual, forever. The gap is what Kubernetes spends its life closing.

What that buys you

Self-healing. A crashed container or dead machine? Replaced automatically.

Scaling. Run 3 copies at night, 30 at peak — by hand or automatically.

Zero-downtime updates. Roll out a new version gradually; roll back if it misbehaves.

Service discovery. Apps find each other by name, even as copies come and go.

02 · The core objects 7 min

Pods, ReplicaSets,
and the Deployment on top.

Almost everything you run is described by a small set of objects that stack on each other. You'll spend 90% of your time with three: the Pod (what runs), the ReplicaSet (how many), and the Deployment (how it changes over time).

Pod the smallest thing Kubernetes runs: one or more containers that share an IP address and storage. Usually it's just oneapp container; occasionally a small helper rides along (a "sidecar"). Pods are disposable— you don't fix a sick pod, you replace it.
Pod · 10.1.4.7 app container api:8080 sidecar (opt.) log-shipper vol ume shared network + storage

Containers in a pod share one IP and can talk over localhost. The pod is the unit of scheduling.

Why a pod, not just a container?

  • Some containers belong together on the same machine — an app and its log shipper, sharing files and a network namespace.
  • The pod is the unit Kubernetes schedules and heals. It lands on one node, lives and dies as a whole.
  • Each pod gets its own IP — but that IP is temporary (the next section fixes that).
ReplicaSet keeps a fixed number of identical pods running. Ask for 3; if one dies it starts a 4th — sorry, a replacement 3rd. You rarely create one directly. Deployment manages ReplicaSets for you and handles version changes (rolling updates, rollbacks, history). This is the object you actually write.
Deployment manages rollouts ReplicaSet keeps count = 3 Pod Pod Pod

You write the Deployment; it creates a ReplicaSet; the ReplicaSet keeps the pods at the count you asked for.

A Deployment, in YAML

apiVersion: apps/v1 kind: Deployment metadata: name: api spec: replicas: 3 # desired count selector: matchLabels: { app: api } template: # the pod blueprint metadata: labels: { app: api } spec: containers: - name: api image: ghcr.io/acme/api:v2 ports: [{ containerPort: 8080 }]

replicas: 3 is the wish; template is the pod stamped out three times. Apply this and the loop does the rest.

Labels & selectors key/value tags on objects, plus queries that match them. This is the glue of Kubernetes: a ReplicaSet owns "pods labeled app=api", and a Service later sends traffic to that same label. Nothing is wired by hardcoded IDs — everything is matched by labels.
03 · Services & networking 6 min

Pods come and go.
A Service gives them a fixed front door.

Every pod gets its own IP, but pods are replaced constantly — so that IP is a moving target. You can't hardcode it. A Service gives a stable name and address that always points at the healthy pods behind it, no matter how often they churn.

Service a stable name, IP, and load balancer for a set of pods picked by label. Other apps talk to http://api (the service name) and never worry about which pod or which node answers. The Service spreads requests across whichever pods are currently healthy.
caller Service app=api Pod Pod Pod

The Service selects pods by label and balances across the healthy ones — pods can be replaced without callers noticing.

A Service, in YAML

apiVersion: v1 kind: Service metadata: name: api spec: selector: app: api # match the Deployment's pods ports: - port: 80 # the service's port targetPort: 8080 # the container's port

Same app: api label as the Deployment — that is the only wiring needed.

Three Service types — how far traffic reaches

ClusterIP

Inside only

The default. A virtual IP reachable only within the cluster. Perfect for one app calling another (frontend → api → db).

NodePort

A port on every node

Opens the same high port on every node's IP. Simple, but raw and rarely used directly in production — mostly a building block.

LoadBalancer

A real external IP

Asks your cloud for an external load balancer with a public IP. The usual way to expose a single TCP service to the internet.

Ingress an HTTP/HTTPS router that sends outside traffic to the right Service by host and path. Instead of one cloud load balancer per app, you run one ingress controller and write rules: shop.acme.com web, shop.acme.com/apiapi. It also terminates TLS (HTTPS) in one place.
user Ingress TLS · routes web svc / api svc /api

One entry point, many backends. Path and host rules decide which Service each request reaches.

Service vs Ingress — when each

  • Service is the in-cluster building block: stable address + load balancing for pods. Every app needs one.
  • Ingress sits in front for HTTP(S) — host/path routing and TLS for many apps behind one address.
  • Rule of thumb: Service for app-to-app, Ingress for browser-to-cluster. Non-HTTP traffic (e.g. a raw TCP database) uses a LoadBalancer Service instead.
04 · Config & storage 5 min

Keep config and data out of the image.

A container image should be the same in every environment — only its settings change. Kubernetes splits that out: ConfigMaps and Secrets hold settings; Volumes and PVCs hold data that must outlive any one pod.

ConfigMap

Non-secret settings

Key/value config — log level, feature flags, a service URL — injected into a pod as environment variables or mounted as files. Change config without rebuilding the image.

Secret

Sensitive values

The same idea for passwords, tokens, and TLS keys. Stored base64-encoded and kept separate so access can be restricted and they can be encrypted at rest. Base64 is notencryption — treat the cluster's secret store with care.

ConfigMap LOG=info Secret DB_PASS=••• Pod env: LOG env: DB_PASS or mounted file

The same image reads its settings from env vars or files — supplied fresh per environment.

Reference them in the pod

containers: - name: api image: acme/api:v2 envFrom: - configMapRef: { name: api-config } env: - name: DB_PASS valueFrom: secretKeyRef: name: api-secrets key: db-password

Config is pulled in by reference — never baked into the image or committed in plaintext.

Volume storage attached to a pod. The pod's own filesystem is wiped when it restarts, so anything that must survive goes on a volume. For durable storage you use two paired objects: PersistentVolume (PV) a real piece of storage in the cluster — and PersistentVolumeClaim (PVC) a pod's request for storage of a certain size and type. The pod asks via a PVC; the cluster binds it to a PV.
Pod PVC "need 10Gi" PV real disk claim → bind

The pod claims storage by size/type; the cluster binds the claim to a real disk and re-attaches it if the pod moves.

Stateless vs stateful

  • Most web apps are stateless — keep data in a database, and any pod can serve any request. This is the easy case.
  • Things that store data locally (databases, queues) are stateful — they need stable storage and identity, handled by a StatefulSet + PVCs.
  • Beginner rule: push state into managed databases where you can, and keep your own pods stateless.
05 · Scaling & self-healing 6 min

Stay up, stay fast,
and update without downtime.

This is where the control loop earns its keep. Probes tell Kubernetes whether a pod is healthy, the autoscaler adjusts the replica count to match load, and rolling updates swap versions a few pods at a time.

livenessProbe

Is it alive?

A periodic check (HTTP, TCP, or command). If it fails repeatedly, Kubernetes restarts the container — the cure for a hung or deadlocked process.

readinessProbe

Ready for traffic?

Checks if a pod can serve right now. While it fails, the pod is pulled out of the Service — no requests sent — but not restarted. Great for warm-up and temporary overload.

startupProbe

Done booting?

Guards slow-starting apps: holds off the liveness check until the app has finished starting, so a slow boot isn't mistaken for a crash.

Probes & resources, in YAML

readinessProbe: httpGet: { path: /healthz, port: 8080 } initialDelaySeconds: 5 livenessProbe: httpGet: { path: /healthz, port: 8080 } periodSeconds: 10 resources: requests: { cpu: "100m", memory: "128Mi" } limits: { cpu: "500m", memory: "256Mi" }

requests tell the scheduler how much room to reserve; limits cap how much a pod may use. Both are also what the autoscaler measures against.

Autoscaling — the HPA

Horizontal Pod Autoscaler (HPA) adds or removes pod replicas to keep a metric near a target (e.g. average CPU at 60%). Traffic spikes → more pods; quiet night → fewer. It changes how many pods run, not how big each is.
kind: HorizontalPodAutoscaler spec: scaleTargetRef: { kind: Deployment, name: api } minReplicas: 3 maxReplicas: 20 metrics: [{ type: Resource, resource: { name: cpu, target: { averageUtilization: 60 } } }]
old v1 new v2 rollout few at a time readiness gates each step · rollback if it stalls

New pods come up and pass readiness before old ones are retired — so capacity never drops and a bad version can be rolled back.

Rolling updates & self-healing

  • Change the image in the Deployment and Kubernetes brings up new pods gradually, waiting for each to pass readiness before removing an old one.
  • kubectl rollout undo snaps back to the previous ReplicaSet if the new version misbehaves.
  • Self-healing is the same loop everywhere: crashed pod restarted, dead node's pods rescheduled elsewhere, count always driven back to desired.
06 · The tooling landscape 6 min

How you actually drive a cluster.

You rarely click buttons. You apply YAML with a CLI, template it so it isn't copy-pasted everywhere, and increasingly let Git be the source of truth. And the cluster itself usually comes from a managed provider. Here are the leading tools, with the trade-off for each.

Applying and templating manifests

kubectl

The cluster CLI

The official command-line tool — apply manifests, inspect and debug everything.

Pro: universal, scriptable, works on any cluster.

Con:raw YAML doesn't scale across many environments on its own.

Helm

The package manager

Bundles manifests into reusable, parameterized charts — install Postgres or your app with one command and a values file.

Pro: huge ecosystem of ready-made charts; easy per-env values.

Con: Go-template logic in YAML gets hard to read and debug.

Kustomize

Template-free overlays

Built into kubectl. Keep a plain base, then patch it per environment with overlays — no templating language.

Pro: pure YAML, no new syntax; clean dev/stage/prod diffs.

Con: no packaging/sharing story like Helm charts.

# apply / inspect / debug kubectl apply -f deploy.yaml kubectl get pods -w kubectl describe pod api-7c9 kubectl logs -f api-7c9 kubectl rollout undo deploy/api

How to choose: start with kubectl + Kustomize — zero new languages and it covers most teams. Reach for Helm when you need to package and share an app, or to install third-party software. Many teams use both: Helm for vendored dependencies, Kustomize for their own apps.

Where the cluster itself comes from

Running the control plane (the brains) yourself is hard and rarely worth it. Managed services run it for you; you just bring your workloads.

EKS · AWS

Amazon

Pro: deepest AWS integration and reach.

Con: the most assembly required to set up.

GKE · GCP

Google

Pro: widely seen as the most polished; strong autopilot mode.

Con: ties you to Google Cloud.

AKS · Azure

Microsoft

Pro: natural fit for Azure / enterprise shops.

Con: tightest in the Azure ecosystem.

Self-managed

You run it

Pro: full control; runs on-prem or anywhere.

Con: you own upgrades, security, and 3am pages.

How to choose:use the managed service of whatever cloud you're already on — that's the right answer for the vast majority. Self-manage only for on-prem, strict compliance, or special hardware needs. For learning, a local cluster (kind, k3d, or minikube) is free and instant.

Git as the source of truth

GitOps a Git repo holds the desired state, and an in-cluster agent continuously makes the cluster match it. You stop running kubectl apply by hand; instead you merge a pull request and the agent syncs it.
Argo CD

UI-driven GitOps

A controller with a rich dashboard that shows sync status and diffs between Git and the live cluster.

Pro: excellent visibility; easy to adopt.

Con: another sizeable system to run and secure.

Flux

Lightweight GitOps

A set of small controllers, Git-native and CLI-first, that compose well with Helm and Kustomize.

Pro: minimal, modular, integrates cleanly.

Con: less of a built-in UI than Argo CD.

How to choose: want a dashboard and a gentle on-ramp? Argo CD. Want a minimal, composable, CLI-first setup? Flux. Either way the win is the same: Git becomes your audit log and your rollback button.

07 · A worked deployment & recap 4 min

From apply to live, in five commands.

Everything so far comes together in one short workflow: declare it, apply it, expose it, scale it, watch it heal.

# 1 · declare desired state (Deployment + Service) kubectl apply -f k8s/ # 2 · watch the pods come up and pass readiness kubectl get pods -w # 3 · expose it to the world via Ingress (already in k8s/) kubectl get ingress # 4 · handle a traffic spike kubectl scale deploy/api --replicas=10 # 5 · ship a new version — rolling update, zero downtime kubectl set image deploy/api api=acme/api:v3

Notice every step describes what you want. The control loop from section 1 makes it real and keeps it real.

Five rules to walk out with

1Declare, don't command. State the desired end; the reconcile loop closes the gap forever.
2Deployment → ReplicaSet → Pods. You write the Deployment; it manages count and rollouts for you.
3Wire by label, reach by Service. Pods are disposable; the Service is the stable front door, Ingress the HTTP router.
4Config and data live outside. ConfigMaps/Secrets for settings; PVCs for anything that must survive a pod.
5Probes + autoscaler = uptime. Health checks heal, the HPA scales, rolling updates ship without downtime.
Knowledge check

Did it stick?

Five quick questions on the control loop, the core objects, networking, config, and scaling — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · Part 2: Advanced Kubernetes → · back to library