Essential kubectl Commands for Troubleshooting

A practical reference of kubectl commands built from real-world platform engineering work — covering pod debugging, node inspection, events, secrets, certificates, ArgoCD, Linkerd service mesh, and GKE-specific patterns.

Tip: Set alias k=kubectl in your shell profile to save keystrokes. All commands below work with both kubectl and the k alias.

🔑 Basics & Context

# Show current context and cluster
kubectl config current-context
kubectl config get-contexts

# Switch context
kubectl config use-context <context-name>

# Set default namespace for current context (avoid typing -n every time)
kubectl config set-context --current --namespace=<namespace>

# List all namespaces
kubectl get namespaces

# Get everything in a namespace
kubectl get all -n <namespace>

# Quick overview of cluster health
kubectl get nodes
kubectl get pods -A | grep -v Running | grep -v Completed

🫛 Pods

Listing and Status

# All pods across all namespaces
kubectl get pods -A

# Pods with node placement and IP
kubectl get pods -n <namespace> -o wide

# Watch pod status in real-time
kubectl get pods -n <namespace> -w

# Show pods with labels
kubectl get pods -n <namespace> --show-labels

# Filter pods by label
kubectl get pods -n <namespace> -l app=my-api

# List only non-running pods (quick health check)
kubectl get pods -A | grep -Ev 'Running|Completed'

Describing and Debugging

# Full pod detail — Events section is most useful for scheduling issues
kubectl describe pod <pod-name> -n <namespace>

# Check why a pod is Pending or CrashLoopBackOff
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 Events

# Show resource requests/limits on all pods
kubectl get pods -n <namespace> -o custom-columns=\
'NAME:.metadata.name,CPU_REQ:.spec.containers[*].resources.requests.cpu,MEM_REQ:.spec.containers[*].resources.requests.memory'

Logs

# Pod logs
kubectl logs <pod-name> -n <namespace>

# Follow logs in real-time
kubectl logs -f <pod-name> -n <namespace>

# Last N lines
kubectl logs --tail=100 <pod-name> -n <namespace>

# Logs from a specific container in a multi-container pod
kubectl logs <pod-name> -c <container-name> -n <namespace>

# Logs from previous (crashed) container instance
kubectl logs <pod-name> --previous -n <namespace>

# Logs from all pods matching a label
kubectl logs -l app=my-api -n <namespace> --tail=50

Exec and Debug

# Open a shell in a running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Run a one-off command in a pod
kubectl exec <pod-name> -n <namespace> -- env
kubectl exec <pod-name> -n <namespace> -- cat /etc/resolv.conf

# Exec into a specific container in a multi-container pod
kubectl exec -it <pod-name> -c <container-name> -n <namespace> -- /bin/sh

# Run a temporary debug pod (useful when app containers have no shell)
kubectl run debug --image=busybox --restart=Never -it --rm -- /bin/sh

# Debug a specific node (creates a privileged pod on the node)
kubectl debug node/<node-name> -it --image=ubuntu

Scaling and Rollouts

# Scale a deployment
kubectl scale deployment <name> --replicas=3 -n <namespace>

# Check rollout status
kubectl rollout status deployment/<name> -n <namespace>

# Rollout history
kubectl rollout history deployment/<name> -n <namespace>

# Roll back to previous version
kubectl rollout undo deployment/<name> -n <namespace>

# Restart all pods in a deployment (triggers rolling restart)
kubectl rollout restart deployment/<name> -n <namespace>

🖥️ Nodes

# Node status
kubectl get nodes
kubectl get nodes -o wide

# Full node detail (conditions, capacity, allocated resources, taints)
kubectl describe node <node-name>

# Show all taints across nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Show all node labels
kubectl get nodes --show-labels

# Filter nodes by label
kubectl get nodes -l nodepool=gpu

# Resource usage per node
kubectl top nodes

# Resource usage per pod
kubectl top pods -n <namespace>
kubectl top pods -A --sort-by=memory

# Check allocated resources vs capacity on a node
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

# Cordon a node (stop new pods scheduling on it)
kubectl cordon <node-name>

# Uncordon a node
kubectl uncordon <node-name>

# Drain a node (evict all pods, cordon node)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

📋 Events

Events are the fastest way to understand what Kubernetes is doing or why something failed. Always check events before diving into logs.

# All events in a namespace, sorted by time (most recent last)
kubectl get events -n <namespace> --sort-by=.lastTimestamp

# All events across the cluster
kubectl get events -A --sort-by='.lastTimestamp' | tail -30

# Watch events in real-time
kubectl get events -n <namespace> -w

# Watch only Pod-related events
kubectl get events -n <namespace> -w --field-selector involvedObject.kind=Pod

# Warning events only (skip normal)
kubectl get events -A --field-selector type=Warning

# Events for a specific object
kubectl get events -n <namespace> \
  --field-selector involvedObject.name=<pod-or-deployment-name>

# Events with wide output (shows source component)
kubectl get events -n <namespace> -o wide --sort-by=.lastTimestamp
Real-world pattern: When a deployment is not rolling out, run kubectl get events -n <ns> -w --field-selector involvedObject.kind=Pod in one terminal and kubectl get pods -n <ns> -w in another. You'll see exactly which filter is blocking each pod in real-time.

🌐 Networking & Services

# List services
kubectl get svc -n <namespace>
kubectl get svc -A

# Describe a service (check endpoints, port mappings)
kubectl describe svc <service-name> -n <namespace>

# Check endpoints — if empty, selector doesn't match any pod labels
kubectl get endpoints <service-name> -n <namespace>

# Port-forward a service to localhost (useful for testing without ingress)
kubectl port-forward service/<service-name> 8080:80 -n <namespace>

# Port-forward a specific pod
kubectl port-forward pod/<pod-name> 8080:8080 -n <namespace>

# List ingresses
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>

# DNS check from inside cluster (run in debug pod)
kubectl run dns-test --image=busybox --restart=Never -it --rm -- \
  nslookup <service-name>.<namespace>.svc.cluster.local

# Check network policies
kubectl get networkpolicy -n <namespace>
kubectl describe networkpolicy <name> -n <namespace>

🔒 Secrets & Certificates

# List secrets
kubectl get secrets -n <namespace>

# Decode a secret value (base64)
kubectl get secret <secret-name> -n <namespace> \
  -o jsonpath="{.data.<key>}" | base64 -d && echo

# Get ArgoCD initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d && echo

# View all keys in a secret (without values)
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data}' | \
  python3 -c "import sys,json; [print(k) for k in json.load(sys.stdin)]"

# Create a TLS secret from cert and key files
kubectl create secret tls <secret-name> \
  --cert=tls.crt \
  --key=tls.key \
  -n <namespace>

# Create a generic secret
kubectl create secret generic <secret-name> \
  --from-literal=username=admin \
  --from-literal=password=mysecret \
  -n <namespace>

cert-manager Certificates

# List all certificates
kubectl get certificate -A

# Check if a certificate is Ready
kubectl get certificate -n <namespace>
# Ready column should show True

# Describe a certificate (shows renewal time, issuer, events)
kubectl describe certificate <cert-name> -n <namespace>

# Check CertificateRequests (shows signing history)
kubectl get certificaterequest -n <namespace>

# Check cert-manager pods are running
kubectl get pods -n cert-manager

# Check Issuer or ClusterIssuer status
kubectl get clusterissuer
kubectl describe clusterissuer <issuer-name>

# Check Linkerd identity issuer certificate
kubectl get certificate -n linkerd
kubectl describe certificate linkerd-identity-issuer -n linkerd

🔍 JSONPath & Output Formatting

JSONPath lets you extract exactly the field you need without parsing full YAML output.

# Get pod's node assignment
kubectl get pod <pod-name> -n <namespace> \
  -o jsonpath='{.spec.nodeName}'

# Get all pod names in a namespace
kubectl get pods -n <namespace> \
  -o jsonpath='{.items[*].metadata.name}'

# Get image used by each container in a deployment
kubectl get deployment <name> -n <namespace> \
  -o jsonpath='{.spec.template.spec.containers[*].image}'

# Get namespace labels (e.g. verify compute class label on GKE)
kubectl get namespace <namespace> \
  -o jsonpath='{.metadata.labels}'

# Get nodeSelector injected into a pod (GKE autopilot check)
kubectl get pod <pod-name> -n <namespace> \
  -o yaml | grep -A 1 nodeSelector

# Check init container restart policies (Linkerd native sidecar check)
kubectl get pod -n <namespace> \
  -o jsonpath='{range .items[0].spec.initContainers[*]}{.name}: restartPolicy={.restartPolicy}{"\n"}{end}'

# Get all container images running across the cluster
kubectl get pods -A \
  -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{" "}{range .spec.containers[*]}{.image}{"\n"}{end}{end}'

# Custom columns output
kubectl get pods -n <namespace> \
  -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,IP:.status.podIP
Quick tip: Use -o yaml first to explore the full object structure, then build your JSONPath expression from the field paths you find.

⚙️ ArgoCD

ArgoCD Commands for managing GitOps applications via both kubectl and the argocd CLI.

Via kubectl

# Get all ArgoCD applications
kubectl -n argocd get applications

# Get ApplicationSets
kubectl -n argocd get applicationsets

# Describe an application (shows sync status, health, conditions)
kubectl -n argocd describe application <app-name>

# Check ArgoCD pods are healthy
kubectl get pods -n argocd

# Get initial admin password (first login)
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d && echo

# Port-forward ArgoCD UI to localhost
kubectl port-forward service/argocd-server -n argocd 8080:443

# Apply root App-of-Apps (cluster bootstrap)
kubectl apply -f bootstrap/<cluster>-root-app.yaml

# Watch application sync in real-time
kubectl -n argocd get applications -w

Via argocd CLI

# Login (after port-forward or via ingress)
argocd login localhost:8080 --insecure

# List all apps and their sync/health status
argocd app list

# Sync an application manually
argocd app sync <app-name>

# Force sync (ignores cache, re-applies all resources)
argocd app sync <app-name> --force

# Check application diff (what would change on next sync)
argocd app diff <app-name>

# Get application details
argocd app get <app-name>

# Hard refresh (re-fetches from Git)
argocd app get <app-name> --hard-refresh
Sync vs Health: An application can be Synced but Degraded — meaning ArgoCD applied all manifests but the workload itself is not healthy (e.g. pods in CrashLoopBackOff). Always check both columns in argocd app list.

🔗 Linkerd Service Mesh

Linkerd Commands for verifying injection, checking certificate health, and debugging mesh traffic.

# Check Linkerd control plane pods are running
kubectl get pods -n linkerd

# Verify Linkerd identity issuer certificate is Ready
kubectl get certificate -n linkerd
kubectl describe certificate linkerd-identity-issuer -n linkerd

# Verify issuer secret exists
kubectl get secret linkerd-identity-issuer -n linkerd

# Check if a namespace is meshed (inject annotation present)
kubectl get namespace <namespace> -o jsonpath='{.metadata.annotations}'

# Mesh a namespace — all new pods will get the Linkerd proxy injected
kubectl annotate namespace <namespace> linkerd.io/inject=enabled

# Un-mesh a namespace
kubectl annotate namespace <namespace> linkerd.io/inject=disabled --overwrite

# Check if a pod has the Linkerd proxy sidecar injected
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'
# Should show: linkerd-proxy alongside your app container

# Verify native sidecar injection (K8s 1.29+ — restartPolicy=Always on init container)
kubectl get pod -n <namespace> \
  -o jsonpath='{range .items[0].spec.initContainers[*]}{.name}: restartPolicy={.restartPolicy}{"\n"}{end}'
# Expected: linkerd-proxy: restartPolicy=Always

Via linkerd CLI

# Check Linkerd installation health
linkerd check

# Check data plane (proxy) health in a namespace
linkerd check --proxy -n <namespace>

# Live traffic stats for a deployment
linkerd viz stat deploy -n <namespace>

# Real-time request tap (sample live traffic)
linkerd viz tap deploy/<name> -n <namespace>

# Top routes by request volume
linkerd viz top deploy/<name> -n <namespace>

☁️ GKE-Specific

GKE Commands and patterns specific to Google Kubernetes Engine, including Autopilot compute class management and workload identity.

Autopilot — Compute Class & Billing

# Set default compute class for a namespace (on-demand)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class=autopilot --overwrite

# Set default compute class for a namespace (spot VMs — cheapest)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class=autopilot-spot --overwrite

# Verify the label was applied
kubectl get namespace <namespace> \
  -o jsonpath='{.metadata.labels}'

# Verify nodeSelector was injected into a pod by GKE Autopilot
kubectl get pod <pod-name> -n <namespace> \
  -o yaml | grep -A 1 nodeSelector

# Remove compute class label (revert to default)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class-

Node Pool & Cluster

# Get GKE cluster credentials (configure kubectl context)
gcloud container clusters get-credentials <cluster-name> \
  --region <region> --project <project-id>

# List node pools and their labels
kubectl get nodes -l cloud.google.com/gke-nodepool=<pool-name>

# Target a specific node pool via nodeSelector
# (in pod spec)
nodeSelector:
  cloud.google.com/gke-nodepool: <pool-name>

# Check Workload Identity annotation on a K8s service account
kubectl get serviceaccount <ksa-name> -n <namespace> \
  -o jsonpath='{.metadata.annotations}'

# List Google CAS ClusterIssuers (cert-manager + Google CAS integration)
kubectl get googlecasclusterissuer

# Check Google CAS issuer status
kubectl describe googlecasclusterissuer <issuer-name>
GKE Autopilot tip: When using Autopilot, label your namespaces with autopilot-spot for batch/non-critical workloads to significantly reduce costs. Use autopilot (on-demand) for production services that cannot tolerate spot preemption.

Key Takeaways

  • Start with events, not logs. kubectl get events --sort-by=.lastTimestamp tells you what Kubernetes has been doing. Logs tell you what your app has been doing. For infrastructure issues, events come first.
  • kubectl describe is your best friend. It aggregates spec, status, and events in one view. For any resource that isn't behaving, describe it before going anywhere else.
  • Use --previous for crashed containers. If a container is in CrashLoopBackOff, kubectl logs --previous shows the logs from the last failed run — the current container may not have enough uptime to log anything useful.
  • Empty endpoints = broken service. If a Service isn't routing traffic, run kubectl get endpoints <svc>. An empty endpoints list means the selector doesn't match any pod labels.
  • Port-forward for quick testing. Before debugging ingress or DNS, use kubectl port-forward to confirm the service itself works. Eliminates the network layer as a variable.
  • ArgoCD Synced ≠ Healthy. Always check both the Sync and Health columns. A Synced but Degraded app means manifests were applied but pods aren't running correctly.
  • JSONPath over grep for automation. Use -o jsonpath in scripts for reliable field extraction instead of parsing human-readable output with grep/awk.