Library
00/07 · ~38 min
GUIDEDECK · PART 2 · for engineers who already run clusters

Advanced
Kubernetes & the
machinery underneath.

A 38-minute deep dive that picks up where the intro deck left off. We assume Pods, Deployments, and Services, and go down a layer: the control plane, operators and CRDs, networking internals, security, service mesh, and running fleets with GitOps.

~38 MININTERMEDIATE → ADVANCEDBUILDS ON PART 1
SCROLL
01 · The control plane, deep 6 min

Behind the magic is one
relentless reconcile loop.

Part 1 introduced the idea that you declare desired state and the cluster makes it real (recap in the intro deck). Now we open the box. Every "Kubernetes did a thing" is some controller watching the API server, comparing desired to observed, and nudging reality one step closer. No central brain issues commands — dozens of small loops each own one slice.

Control plane the set of components that make global decisions about the cluster: the API server (the single front door), etcd (the datastore), the scheduler (places pods on nodes), and the controller manager (a bundle of reconcile loops). Worker nodes run the kubelet and your pods; they only ever talk to the API server.
kube-apiserver the only front door etcd Raft · source of truth only writer scheduler picks a node controller-mgr reconcile loops kubelet · node runs the pods

Only the API server reads and writes etcd. Every other component is a client that watches and acts.

Who does what

  • API server — a stateless REST front end. It authenticates, authorizes, runs admission, validates, and persists. Horizontally scalable because the state lives in etcd, not in it.
  • etcd — a distributed key/value store using the Raft consensus algorithm for a strongly-consistent, replicated log. It is the one piece you must back up.
  • Scheduler — watches for pods with no node, scores the feasible nodes, and writes its choice back. It only decides; the kubelet does the running.
  • Controllers — each watches one object kind and drives it toward spec. Deployment, ReplicaSet, Job, Node — all just loops.
The reconcile loop watch for change, compare desired spec to observed status, take one action, repeat. Controllers are level-triggered, not edge-triggered: they act on the current state, not on the event that woke them. Miss an event and the next resync still corrects it — which is why Kubernetes is so resilient to dropped messages and restarts.
// every controller is this shape for { obj := informer.Next() // woken by a watch event desired := obj.Spec // what you asked for actual := observe(obj) // what the world looks like if desired != actual { act(obj) // ONE step toward desired return // requeue — re-check next loop } }
API server watch informer cache + queue Reconcile() one step writes back · requeues

An informer turns a watch stream into a cached work queue; the reconcile function runs once per item and requeues itself.

Admission control the gate between "authorized" and "stored". After authn/authz, the API server runs mutating webhooks (inject a sidecar, set defaults) then validating webhooks (reject what breaks policy). This is the hook every policy engine and service mesh plugs into — remember it for sections 4 and 5.
02 · Operators & CRDs 6 min

Teach the cluster new nouns
and the loops to run them.

The control loop isn't just for built-in types. You can add your own object kinds with a CRD, then write a controller that reconciles them — an operator. That is how Postgres, Kafka, cert-manager, and Argo all become first-class "kinds" you manage with kubectl apply.

CustomResourceDefinition (CRD) a schema that registers a brand-new object kind with the API server. Once applied, kind: Database is as real as kind: Pod: stored in etcd, validated by an OpenAPI schema, served over REST, RBAC-controlled, and watchable. A CRD adds the noun; it does nothing on its own.
Operator a CRD plus a custom controller that encodes operational know-how. The controller watches your custom resource and does what a human SRE would: provision, configure, back up, fail over, upgrade. It turns a runbook into a reconcile loop.
# a custom resource — your new noun apiVersion: acme.io/v1 kind: Database metadata: name: orders spec: engine: postgres-16 replicas: 3 backup: { schedule: "0 2 * * *" } status: # written by the operator, not you phase: Ready
Database CR orders · spec operator StatefulSet Service CronJob watches

You write the Database; the operator reconciles it into the built-in objects that actually run Postgres.

Two details that separate a toy from a real operator

status subresource

Spec is yours, status is theirs

A well-behaved operator never writes spec (the user owns that) and reports progress only in status. The /statussubresource lets it update status without bumping the object's generation — so it doesn't trigger its own reconcile in a loop.

finalizers

Clean up before you vanish

A finalizer is a string on an object that blocks deletion until the operator removes it. On a delete the object enters deletionTimestamp state; the operator runs teardown (delete the cloud disk, deregister DNS), then drops the finalizer and Kubernetes completes the delete.

Tooling — building an operator

Kubebuilder

The SIG-native scaffolder

The community project that wraps controller-runtime: scaffolds CRD types, the manager, and reconcilers in Go.

Pro: closest to upstream, minimal magic, the de-facto base everything else builds on.

Con: Go-only and lower-level; you wire more yourself.

Operator SDK

The batteries-included kit

Red Hat's toolkit built on top of Kubebuilder, adding Helm- and Ansible-based operators plus OLM packaging and scorecard tests.

Pro: non-Go paths (Helm/Ansible) and a richer release story.

Con: more layers and conventions to learn.

How to choose: both produce the same controller-runtime reconciler under the hood. Pick Kubebuilder for a lean Go operator close to upstream; pick Operator SDK when you want a Helm/Ansible path or OLM distribution. Before writing any operator, check whether one already exists — most popular software does.

03 · Networking internals 6 min

How a packet actually
reaches a pod.

Part 1 said "every pod gets an IP and a Service load-balances across them." True — but whoassigns those IPs, and what turns a Service's virtual IP into a real pod? The answers are CNI for pod networking, kube-proxy (or its eBPF replacements) for Services, and NetworkPolicy for the firewall.

CNI — Container Network Interface a plugin spec the kubelet calls to give each pod its network. When a pod is scheduled, the kubelet invokes the CNI plugin (Calico, Cilium, the cloud's own) to allocate an IP and wire the pod into a flat network where every pod can reach every other pod directly, no NAT. The plugin is also where overlay vs native routing is decided.
kube-proxy & the Service VIP a Service's ClusterIP is virtual; nothing listens on it. kube-proxy programs each node's kernel (iptables or IPVS) to rewrite traffic for the VIP onto a real pod IP from the EndpointSlice. Modern dataplanes like Cilium replace kube-proxy entirely with eBPF programs for lower latency at scale.
client pod → api:80 node kernel iptables/IPVS/eBPF pod 10.1.4.7 pod 10.2.9.2 pod 10.3.1.5 DNAT to one endpoint

There is no Service process. The node's kernel rewrites the VIP to one healthy endpoint — load balancing happens in the dataplane.

The path, end to end

  • The client resolves apivia cluster DNS (CoreDNS) to the Service's ClusterIP.
  • The packet leaves the pod; the node kernel matches the VIP and DNATs it to a real pod IP chosen from the EndpointSlice.
  • EndpointSlices (the scalable successor to Endpoints) are what the readiness probe ultimately edits — fail readiness and your IP is removed here.
  • For HTTP, an Ingress controller or the newer Gateway API terminates TLS and routes by host/path before this even happens.
NetworkPolicy a pod-level firewall selected by labels. By default every pod can talk to every other pod. The moment any NetworkPolicy selects a pod, that pod flips to default-deny for the direction(s) you specify — only the allowed traffic gets through. Policies are enforced by the CNI plugin, so your CNI must support them (Calico and Cilium do; some basic plugins do not).
# deny all ingress, then add narrow allows apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: { name: db-lockdown } spec: podSelector: matchLabels: { app: db } policyTypes: [Ingress] ingress: - from: - podSelector: matchLabels: { app: api } # only api → db

Why this matters

  • A flat network means a compromised front-end can reach your database by default. NetworkPolicy is how you re-introduce segmentation.
  • Best practice: a namespace-wide default-deny, then explicit allows per dependency — least privilege at the network layer.
  • Policies are label-driven, like everything else, so they follow pods as they reschedule — no IPs to chase.
  • For L7 rules (by HTTP path/method) you need a CNI like Cilium or a service mesh — plain NetworkPolicy is L3/L4 only.
04 · Security 5 min

Least privilege, all the way
from RBAC to the image.

A cluster has many ways to be too permissive: identities that can do anything, pods that run as root, secrets in plaintext, and images nobody verified. The defences stack — RBAC for who, Pod Security for what a pod may do, encrypted secrets, and supply-chain checks for what you even allow to run.

RBAC — Role-Based Access Control grant verbs on resources to subjects, and nothing else. A Role (namespaced) or ClusterRole (cluster-wide) lists allowed verbs (get, list, create) on resource kinds. A RoleBinding attaches that role to a subject — a user, group, or ServiceAccount. RBAC is purely additive: there are no deny rules, so you grant up from zero.
ServiceAccount an identity for a pod, not a human. Every pod runs as one; its API calls carry a short-lived, projectedtoken bound to that pod's lifetime (modern clusters dropped the old never-expiring tokens). Pair a dedicated ServiceAccount with a tight Role so each workload can touch only what it needs — not the whole cluster.
ServiceAcct api-sa RoleBinding subject → role Role get,list pods resources configmaps

Subject + Role = RoleBinding. Verbs and resources live in the Role; the binding just says "this identity may use it."

Pod Security & secrets

  • Pod Security Admission replaced the old PodSecurityPolicy. It enforces three levels per namespace — privileged, baseline, restricted — by a simple label. Aim for restricted: no root, no privilege escalation, dropped capabilities.
  • Secrets are only base64 in etcd by default. Turn on encryption at rest (a KMS provider) and restrict get secrets via RBAC — or push secrets to an external store (Vault, cloud secret managers) via the External Secrets operator.
  • Disable automountServiceAccountTokenwhere a pod doesn't call the API at all.
Supply-chain security prove an image is the one you built and trust before it runs. Sign images with Sigstore/cosign, generate an SBOM (software bill of materials), and have an admission policy reject unsigned or unscanned images. The policy engine that enforces this is the same admission hook from section 1.

Tooling — policy enforcement

OPA / Gatekeeper

General policy in Rego

Open Policy Agent with the Gatekeeper admission controller. Constraints are written in Rego, a purpose-built policy language reusable far beyond Kubernetes.

Pro: extremely expressive; one policy engine across CI, APIs, and clusters.

Con: Rego has a real learning curve.

Kyverno

Policy as YAML

A Kubernetes-native engine where policies are CRDs written in YAML — validate, mutate, and generate resources without a new language.

Pro: no DSL; mutate/generate built in; instantly familiar.

Con: Kubernetes-only and less expressive for complex logic.

How to choose: want policy that spans more than Kubernetes, or truly complex rules? OPA/Gatekeeper. Want to ship guardrails this afternoon in plain YAML, including mutation? Kyverno. Both run as validating/mutating webhooks — the choice is language, not mechanism.

05 · Service mesh & traffic 5 min

Move mTLS and routing
out of your app.

Once you have many services, you want encrypted service-to-service traffic, retries, and fine-grained traffic splitting — without building it into every app. A service mesh puts a proxy beside each workload and manages all of that from the platform layer.

Service mesh a dedicated proxy layer that handles service-to-service networking. A data plane of proxies (classically an Envoy sidecar per pod) carries the traffic; a control plane configures them. Your app just calls http://orders as before — the proxy transparently adds mTLS, retries, timeouts, and metrics.
mTLS — mutual TLS both sides present certificates, so traffic is encrypted and identity is verified. The mesh issues a short-lived cert per workload identity (often SPIFFE-style) and rotates it automatically. You get zero-trust in-cluster traffic without changing application code.
app pod proxy orders v1 90% orders v2 10% · canary mTLS · weighted split

The proxy encrypts every hop and splits traffic by weight — 90% stable, 10% to the canary — with no app change.

Canary & traffic splitting

  • A canary sends a small slice of real traffic to a new version, watches its metrics, and only then ramps up — or rolls back automatically.
  • The mesh splits by weightor by request attributes (header, user), independent of replica counts — far finer than a Deployment's rolling update.
  • Retries, timeouts, and circuit breaking move into the proxy, so every service gets them consistently.
  • A newer architecture — Istio's ambient mode — drops the per-pod sidecar for a per-node proxy plus optional L7 waypoints, cutting overhead.

Tooling — the mesh itself

Istio

The full-featured mesh

Envoy-based, with deep traffic management, policy, and multi-cluster support; now offers sidecar and sidecarless ambient modes.

Pro: the most capable; handles complex routing and large multi-cluster setups.

Con: large surface area and real operational weight.

Linkerd

The lightweight mesh

A CNCF-graduated mesh built on a purpose-built Rust micro-proxy, optimized for simplicity and low overhead.

Pro: simple, fast, easy to operate; mTLS on by default.

Con: fewer advanced traffic features than Istio.

How to choose: want maximum control, complex routing, or multi-cluster? Istio (consider ambient mode to cut overhead). Want mTLS and golden metrics with the least operational burden? Linkerd. And remember a mesh is not free — adopt one only when app-level networking pain justifies the proxies.

06 · GitOps & scaling at scale 5 min

Run fleets the way you
run one cluster.

At scale, two problems dominate: keeping many clusters in a known, audited state, and right-sizing both pods and nodes as load moves. GitOps answers the first; a stack of autoscalers answers the second.

GitOps Git is the desired state, and an in-cluster agent continuously reconciles the cluster to match it. It is the reconcile loop from section 1 applied to your whole config: merge a PR, the agent syncs it, and any manual drift is reverted. Git becomes your audit log and your rollback button.
Git repo desired state GitOps agent in-cluster cluster live objects pull drift detected → reverted

The agent pulls from Git and reconciles continuously — a change made by hand drifts, then gets reverted to match the repo.

The three autoscalers

  • HPA (horizontal) — adds/removes pod replicas to keep a metric near target. The everyday one (Part 1).
  • VPA (vertical) — right-sizes each pod's CPU/memory requests. Great for hard-to-parallelize workloads; don't run it on the same metric as the HPA.
  • Cluster Autoscaler / Karpenter — add or remove nodeswhen pods can't be scheduled or sit idle. Karpenter provisions right-sized nodes just-in-time.
  • Together: HPA scales pods, the node autoscaler grows the cluster to fit them, VPA tunes the sizes.
Progressive delivery & multi-cluster automate canaries, and treat many clusters as one fleet. Tools like Argo Rollouts and Flagger drive the metric-gated canaries from section 5 declaratively. For many clusters, Cluster APImanages cluster lifecycle as Kubernetes objects, and GitOps "app-of-apps" / fleet patterns roll config out everywhere.

Tooling — GitOps engines

Argo CD

UI-driven GitOps

A controller with a rich dashboard showing sync status and live-vs-Git diffs; pairs with Argo Rollouts for progressive delivery.

Pro: excellent visibility; gentle on-ramp; strong UI.

Con: another sizeable system to run and secure.

Flux

Lightweight GitOps

A set of small, composable controllers, Git-native and CLI-first, that integrate cleanly with Helm and Kustomize.

Pro: minimal, modular, easy to embed in automation.

Con:no built-in UI as rich as Argo's.

How to choose: want a dashboard and an easy start? Argo CD. Want a minimal, composable, automation-first setup? Flux. Both make Git the source of truth; the difference is surface area, not idea.

07 · A worked operator & recap 5 min

One CRD, one loop —
everything else is detail.

Tie it together with the shape of a tiny operator that codifies a policy: "every Databasemust have an off-site backup." It is the same reconcile loop from section 1, now running your noun.

// the heart of an operator — controller-runtime func (r *DatabaseReconciler) Reconcile(ctx, req) { var db Database r.Get(ctx, req.NamespacedName, &db) if db.Spec.Backup == nil { // enforce policy return reject("backup is required") } r.ensureStatefulSet(ctx, &db) // converge built-ins r.ensureBackupCronJob(ctx, &db) db.Status.Phase = "Ready" r.StatusUpdate(ctx, &db) // status, never spec return requeue() // level-triggered, forever }

Watch the resource, enforce the rule, converge the built-in objects, report status, requeue. That single pattern powers every controller in the cluster.

Five rules to walk out with

1It's loops all the way down. The API server stores intent; level-triggered controllers reconcile it forever.
2Extend with CRDs + operators. A new noun plus a reconcile loop turns a runbook into software.
3The network is programmable. CNI gives pods IPs, kube-proxy/eBPF resolves Services, NetworkPolicy segments them.
4Least privilege, layered. RBAC, restricted Pod Security, encrypted secrets, signed images — and policy at admission.
5Push platform concerns down. Mesh for mTLS/traffic, GitOps for state, autoscalers for size.
  • No operator until an existing one won't do — most software already ships one.
  • No mesh until app-level networking pain is real; the proxies cost latency and ops.
  • No multi-cluster until one cluster genuinely can't meet blast-radius or locality needs.
  • Advanced tools earn their keep at scale — and add weight before it.
Knowledge check

Did it stick?

Five questions on the control plane, operators, networking internals, security, and the mesh — instant feedback, no sign-in.

Rate this deck
be the first

Navigate with ← → or scroll · back to library