A 38-minute deep dive that picks up where the intro deck left off. We assume Pods, Deployments, and Services, and go down a layer: the control plane, operators and CRDs, networking internals, security, service mesh, and running fleets with GitOps.
Part 1 introduced the idea that you declare desired state and the cluster makes it real (recap in the intro deck). Now we open the box. Every "Kubernetes did a thing" is some controller watching the API server, comparing desired to observed, and nudging reality one step closer. No central brain issues commands — dozens of small loops each own one slice.
Only the API server reads and writes etcd. Every other component is a client that watches and acts.
An informer turns a watch stream into a cached work queue; the reconcile function runs once per item and requeues itself.
The control loop isn't just for built-in types. You can add your own object kinds with a CRD, then write a controller that reconciles them — an operator. That is how Postgres, Kafka, cert-manager, and Argo all become first-class "kinds" you manage with kubectl apply.
kind: Database is as real as kind: Pod: stored in etcd, validated by an OpenAPI schema, served over REST, RBAC-controlled, and watchable. A CRD adds the noun; it does nothing on its own.You write the Database; the operator reconciles it into the built-in objects that actually run Postgres.
A well-behaved operator never writes spec (the user owns that) and reports progress only in status. The /statussubresource lets it update status without bumping the object's generation — so it doesn't trigger its own reconcile in a loop.
A finalizer is a string on an object that blocks deletion until the operator removes it. On a delete the object enters deletionTimestamp state; the operator runs teardown (delete the cloud disk, deregister DNS), then drops the finalizer and Kubernetes completes the delete.
The community project that wraps controller-runtime: scaffolds CRD types, the manager, and reconcilers in Go.
Pro: closest to upstream, minimal magic, the de-facto base everything else builds on.
Con: Go-only and lower-level; you wire more yourself.
Red Hat's toolkit built on top of Kubebuilder, adding Helm- and Ansible-based operators plus OLM packaging and scorecard tests.
Pro: non-Go paths (Helm/Ansible) and a richer release story.
Con: more layers and conventions to learn.
How to choose: both produce the same controller-runtime reconciler under the hood. Pick Kubebuilder for a lean Go operator close to upstream; pick Operator SDK when you want a Helm/Ansible path or OLM distribution. Before writing any operator, check whether one already exists — most popular software does.
Part 1 said "every pod gets an IP and a Service load-balances across them." True — but whoassigns those IPs, and what turns a Service's virtual IP into a real pod? The answers are CNI for pod networking, kube-proxy (or its eBPF replacements) for Services, and NetworkPolicy for the firewall.
There is no Service process. The node's kernel rewrites the VIP to one healthy endpoint — load balancing happens in the dataplane.
apivia cluster DNS (CoreDNS) to the Service's ClusterIP.A cluster has many ways to be too permissive: identities that can do anything, pods that run as root, secrets in plaintext, and images nobody verified. The defences stack — RBAC for who, Pod Security for what a pod may do, encrypted secrets, and supply-chain checks for what you even allow to run.
get, list, create) on resource kinds. A RoleBinding attaches that role to a subject — a user, group, or ServiceAccount. RBAC is purely additive: there are no deny rules, so you grant up from zero.Subject + Role = RoleBinding. Verbs and resources live in the Role; the binding just says "this identity may use it."
privileged, baseline, restricted — by a simple label. Aim for restricted: no root, no privilege escalation, dropped capabilities.get secrets via RBAC — or push secrets to an external store (Vault, cloud secret managers) via the External Secrets operator.automountServiceAccountTokenwhere a pod doesn't call the API at all.Open Policy Agent with the Gatekeeper admission controller. Constraints are written in Rego, a purpose-built policy language reusable far beyond Kubernetes.
Pro: extremely expressive; one policy engine across CI, APIs, and clusters.
Con: Rego has a real learning curve.
A Kubernetes-native engine where policies are CRDs written in YAML — validate, mutate, and generate resources without a new language.
Pro: no DSL; mutate/generate built in; instantly familiar.
Con: Kubernetes-only and less expressive for complex logic.
How to choose: want policy that spans more than Kubernetes, or truly complex rules? OPA/Gatekeeper. Want to ship guardrails this afternoon in plain YAML, including mutation? Kyverno. Both run as validating/mutating webhooks — the choice is language, not mechanism.
Once you have many services, you want encrypted service-to-service traffic, retries, and fine-grained traffic splitting — without building it into every app. A service mesh puts a proxy beside each workload and manages all of that from the platform layer.
http://orders as before — the proxy transparently adds mTLS, retries, timeouts, and metrics.The proxy encrypts every hop and splits traffic by weight — 90% stable, 10% to the canary — with no app change.
Envoy-based, with deep traffic management, policy, and multi-cluster support; now offers sidecar and sidecarless ambient modes.
Pro: the most capable; handles complex routing and large multi-cluster setups.
Con: large surface area and real operational weight.
A CNCF-graduated mesh built on a purpose-built Rust micro-proxy, optimized for simplicity and low overhead.
Pro: simple, fast, easy to operate; mTLS on by default.
Con: fewer advanced traffic features than Istio.
How to choose: want maximum control, complex routing, or multi-cluster? Istio (consider ambient mode to cut overhead). Want mTLS and golden metrics with the least operational burden? Linkerd. And remember a mesh is not free — adopt one only when app-level networking pain justifies the proxies.
At scale, two problems dominate: keeping many clusters in a known, audited state, and right-sizing both pods and nodes as load moves. GitOps answers the first; a stack of autoscalers answers the second.
The agent pulls from Git and reconciles continuously — a change made by hand drifts, then gets reverted to match the repo.
A controller with a rich dashboard showing sync status and live-vs-Git diffs; pairs with Argo Rollouts for progressive delivery.
Pro: excellent visibility; gentle on-ramp; strong UI.
Con: another sizeable system to run and secure.
A set of small, composable controllers, Git-native and CLI-first, that integrate cleanly with Helm and Kustomize.
Pro: minimal, modular, easy to embed in automation.
Con:no built-in UI as rich as Argo's.
How to choose: want a dashboard and an easy start? Argo CD. Want a minimal, composable, automation-first setup? Flux. Both make Git the source of truth; the difference is surface area, not idea.
Tie it together with the shape of a tiny operator that codifies a policy: "every Databasemust have an off-site backup." It is the same reconcile loop from section 1, now running your noun.
Watch the resource, enforce the rule, converge the built-in objects, report status, requeue. That single pattern powers every controller in the cluster.
Five questions on the control plane, operators, networking internals, security, and the mesh — instant feedback, no sign-in.
Navigate with ← → or scroll · back to library