7 min read

K8S Series Part 01 - kubectl: How You Actually Talk to a Cluster


Before any of the fancy GitOps automation, there's one tool you reach for to see what
the cluster is doing: kubectl. Even though I almost never use it to change things
(that's Flux's job - article 02), I use it constantly to observe and debug. If you're new to Kubernetes, getting comfortable here pays off more than anything else.

This article is the mental model plus the commands I actually type.


What is kubectl, really?

A Kubernetes cluster is, at heart, one component you talk to: the API server. It
holds the desired state of everything - every Deployment, Service, Secret - in a
database (etcd). Controllers running in the cluster constantly compare desired state to
reality and act to close the gap.

kubectl is just a REST client for that API server. Every command becomes an HTTPS
call.

Two things make kubectl work:

  1. The kubeconfig (~/.kube/config by default, or $KUBECONFIG). It says which
    cluster, which user/credentials, and which namespace. On K3s the admin config
    lives at /etc/rancher/k3s/k3s.yaml on a server node - copy it to your workstation
    and fix the server: address to point at the cluster's API endpoint.
  2. A context - a named "(cluster, user, namespace)" tuple. You can hold many and
    switch between them.
kubectl config get-contexts           # list contexts
kubectl config current-context        # which one am I pointed at?
kubectl config use-context safeqbit   # switch
kubectl config set-context --current --namespace=authentik   # default ns for this context
Tip that saved me hours: the single most common "why isn't this working" cause for
beginners is being in the wrong namespace or wrong context. Check it first, always.

The object model in one breath

Everything in Kubernetes is an object with a kind. You'll meet these constantly:

Namespace      a folder that groups and isolates objects
Pod            one or more containers running together (the unit that actually runs)
Deployment     "keep N identical pods running, and roll them out safely on change"
StatefulSet    like a Deployment but for things with stable identity/storage (databases)
DaemonSet      "run exactly one pod of this on every node" (e.g. ingress, node-exporter)
Service        a stable virtual IP/DNS name in front of a set of pods
Ingress        HTTP routing rules: hostname/path ──► Service
PVC            "I want a disk this big" (PersistentVolumeClaim)
Secret/ConfigMap   key-value data injected into pods

You don't memorize this. You recognize it, and you let kubectl explain teach you the
fields when you need them:

kubectl explain deployment.spec.strategy   # what fields exist, what they mean
kubectl api-resources                       # every kind your cluster knows about

The verbs you'll use 95% of the time

get - list things

kubectl get pods                       # pods in the current namespace
kubectl get pods -A                    # across ALL namespaces
kubectl get pods -n monitoring -o wide # include node + pod IP
kubectl get pods -w                    # watch - live-update as things change
kubectl get all -n affine              # common kinds in one namespace

Output worth reading like a dashboard:

NAME                              READY   STATUS    RESTARTS      AGE
authentik-server-7c9f8b6c-2xk4d   1/1     Running   0             6d
authentik-worker-5d4b9c7f-lm8p    1/1     Running   2 (3h ago)    6d
netbox-redis-0                    0/1     Pending   0             40s
  • READY 1/1 = all containers in the pod are up. 0/1 = something's wrong.
  • STATUS - Running is good; Pending, CrashLoopBackOff, ImagePullBackOff,
    Error, Init:... each point you somewhere specific (see the playbook below).
  • RESTARTS with a recent timestamp means the container is dying and being restarted.

describe - the full story of one object

get tells you that something's wrong; describe usually tells you why. The
Events at the bottom are gold.

kubectl describe pod netbox-redis-0 -n netbox
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  41s   default-scheduler  0/3 nodes available: 3 Insufficient memory.

There's the answer: nothing wrong with the image or config - the cluster just can't fit
the pod anywhere.

logs - what the app itself said

kubectl logs deploy/authentik-server -n authentik          # current logs
kubectl logs deploy/authentik-server -n authentik -f       # follow (tail -f)
kubectl logs <pod> -n <ns> --previous                       # logs from the CRASHED instance
kubectl logs <pod> -n <ns> -c <container> --since=1h        # one container, last hour

--previous is the one people forget. When a pod is in CrashLoopBackOff, the current
container may be too young to have logs - you want the logs from the instance that just
died.

exec - open a shell inside a running container

kubectl exec -it deploy/uptime-kuma -n uptime-kuma -- sh
# one-off command without a shell:
kubectl exec -n uptime-kuma deploy/uptime-kuma -- curl -sI https://affine.local.safeqbit.com

The -- separates kubectl's flags from the command you're running inside the
container. I lean on this for in-cluster network tests - "can this pod actually reach
that service?" is a question you answer from inside the pod, not from your laptop.

apply / delete - change things (sparingly, here)

kubectl apply -f some.yaml      # create or update to match the file
kubectl delete pod <name> -n <ns>   # delete (a Deployment will recreate it)

In a GitOps cluster you mostly don't run these by hand - Flux does. But deleting a pod
to force a clean restart is a legitimate, safe operation (the controller brings back a
fresh one), and apply --dry-run=server is great for validating a manifest.


A repeatable troubleshooting workflow

When something's broken, I follow the same funnel every time. Don't randomly poke - go
top to bottom.

   ┌───────────────────────────────────────────────────────────┐
   │ 1. WHERE is it?      kubectl get pods -A | grep -v Running│
   ├───────────────────────────────────────────────────────────┤
   │ 2. WHAT state?       Pending / CrashLoop / ImagePull / …  │
   ├───────────────────────────────────────────────────────────┤
   │ 3. WHY (events)?     kubectl describe pod <p> -n <ns>     │
   ├───────────────────────────────────────────────────────────┤
   │ 4. WHAT did it say?  kubectl logs <p> -n <ns> --previous  │
   ├───────────────────────────────────────────────────────────┤
   │ 5. PROVE the fix     re-check get; reproduce from inside  │
   └───────────────────────────────────────────────────────────┘

Decode the status, then act

STATUS Usually means First thing to check
Pending Scheduler can't place the pod describe → Events: insufficient CPU/mem, no matching node, unbound PVC
ContainerCreating (stuck) Volume won't mount or image still pulling describe → Events: CSI/mount errors, large image
ImagePullBackOff Can't pull the image Wrong tag/registry, private registry without creds
CrashLoopBackOff Container starts then exits repeatedly logs --previous - bad config, missing secret, failed migration
Running but 0/1 READY Failing its readiness probe describe → probe config; logs; is the dependency (DB) up?
Init:0/1 An initContainer hasn't finished logs <pod> -c <init-container>
Terminating (stuck) Finalizer or node gone describe; check the node; last resort force-delete

Cluster-level questions

kubectl get nodes -o wide                 # are all three nodes Ready?
kubectl describe node k3s-server-02       # capacity, pressure, taints
kubectl top nodes ; kubectl top pods -A   # live CPU/memory (needs metrics-server)
kubectl get events -A --sort-by=.lastTimestamp | tail -30   # what just happened, clusterwide

That last one - events sorted by time - is my "what changed in the last few minutes?"
command. Half the time it points straight at the problem.

When the network is the problem

Some of the nastiest issues aren't the app at all - they're connectivity between pods or
to a Service. Test from inside:

# DNS resolution inside the cluster
kubectl exec -n uptime-kuma deploy/uptime-kuma -- nslookup authentik-server.authentik.svc.cluster.local

# TCP reachability to a service IP / port
kubectl exec -n uptime-kuma deploy/uptime-kuma -- nc -zv 10.10.13.50 443

# does the Service have endpoints? (no endpoints = selector matches nothing)
kubectl get endpoints -n authentik authentik-server
A real one I experienced recently: after a node reboot, pods on different nodes couldn't talk over TCP even though ping worked. The cause was a NIC checksum-offload quirk breaking VXLAN - not anything in Kubernetes. The only reason I found it quickly was testing reachability from inside a pod and noticing ICMP succeeded while TCP/UDP hung.
Lesson: embrace the TCP/IP or the OSI model mindset when troubleshooting issues that look trivial.

Quality-of-life setup

A few things I set up early and never regretted:

alias k=kubectl
source <(kubectl completion bash)        # tab-completion for resources & names
export do="--dry-run=client -o yaml"     # scaffold manifests: k create deploy x --image=y $do
  • k9s - a terminal UI over kubectl. For browsing and quick
    log/exec it's far faster than typing. I still drop to raw kubectl for anything
    precise or scriptable.
  • kubectl explain beats googling field names.
  • Set your namespace in the context so you stop typing -n for the app you're working on.

What to take away

  • kubectl is a REST client; the kubeconfig says where/who, the context bundles it up.
  • get → describe → logs → exec is the funnel. Events and --previous logs are where
    the answers usually hide.
  • Read pod STATUS as a signpost, not noise - each value points at a different layer.
  • When the symptoms don't match the app, test connectivity from inside a pod and suspect
    the layer below.

Now that you can see the cluster, the next question is how anything gets into it
without me running kubectl apply all day. That's GitOps.

Resources

This cluster

Documentation


Next lesson → In GitOps with Flux, we'll delete the "works on my cluster" problem for good: how a single git push becomes running infrastructure, and why I almost never type kubectl apply by hand.

-- ms
-- ms
Measuring the internet...