Essential kubectl Commands for Troubleshooting

A practical reference of kubectl commands built from real-world platform engineering work — covering pod debugging, node inspection, events, secrets, certificates, ArgoCD, Linkerd service mesh, and GKE-specific patterns.

Jump to: 🔑 Basics 🫛 Pods 🖥️ Nodes 📋 Events 🌐 Networking 🔒 Secrets & Certs 🔍 JSONPath ⚙️ ArgoCD 🔗 Linkerd ☁️ GKE

Tip: Set alias k=kubectl in your shell profile to save keystrokes. All commands below work with both kubectl and the k alias.

🔑 Basics & Context

# Show current context and cluster
kubectl config current-context
kubectl config get-contexts

# Switch context
kubectl config use-context <context-name>

# Set default namespace for current context (avoid typing -n every time)
kubectl config set-context --current --namespace=<namespace>

# List all namespaces
kubectl get namespaces

# Get everything in a namespace
kubectl get all -n <namespace>

# Quick overview of cluster health
kubectl get nodes
kubectl get pods -A | grep -v Running | grep -v Completed

🫛 Pods

Listing and Status

# All pods across all namespaces
kubectl get pods -A

# Pods with node placement and IP
kubectl get pods -n <namespace> -o wide

# Watch pod status in real-time
kubectl get pods -n <namespace> -w

# Show pods with labels
kubectl get pods -n <namespace> --show-labels

# Filter pods by label
kubectl get pods -n <namespace> -l app=my-api

# List only non-running pods (quick health check)
kubectl get pods -A | grep -Ev 'Running|Completed'

Describing and Debugging

# Full pod detail — Events section is most useful for scheduling issues
kubectl describe pod <pod-name> -n <namespace>

# Check why a pod is Pending or CrashLoopBackOff
kubectl describe pod <pod-name> -n <namespace> | grep -A 20 Events

# Show resource requests/limits on all pods
kubectl get pods -n <namespace> -o custom-columns=\
'NAME:.metadata.name,CPU_REQ:.spec.containers[*].resources.requests.cpu,MEM_REQ:.spec.containers[*].resources.requests.memory'

Logs

# Pod logs
kubectl logs <pod-name> -n <namespace>

# Follow logs in real-time
kubectl logs -f <pod-name> -n <namespace>

# Last N lines
kubectl logs --tail=100 <pod-name> -n <namespace>

# Logs from a specific container in a multi-container pod
kubectl logs <pod-name> -c <container-name> -n <namespace>

# Logs from previous (crashed) container instance
kubectl logs <pod-name> --previous -n <namespace>

# Logs from all pods matching a label
kubectl logs -l app=my-api -n <namespace> --tail=50

Exec and Debug

# Open a shell in a running pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Run a one-off command in a pod
kubectl exec <pod-name> -n <namespace> -- env
kubectl exec <pod-name> -n <namespace> -- cat /etc/resolv.conf

# Exec into a specific container in a multi-container pod
kubectl exec -it <pod-name> -c <container-name> -n <namespace> -- /bin/sh

# Run a temporary debug pod (useful when app containers have no shell)
kubectl run debug --image=busybox --restart=Never -it --rm -- /bin/sh

# Debug a specific node (creates a privileged pod on the node)
kubectl debug node/<node-name> -it --image=ubuntu

Scaling and Rollouts

# Scale a deployment
kubectl scale deployment <name> --replicas=3 -n <namespace>

# Check rollout status
kubectl rollout status deployment/<name> -n <namespace>

# Rollout history
kubectl rollout history deployment/<name> -n <namespace>

# Roll back to previous version
kubectl rollout undo deployment/<name> -n <namespace>

# Restart all pods in a deployment (triggers rolling restart)
kubectl rollout restart deployment/<name> -n <namespace>

🖥️ Nodes

# Node status
kubectl get nodes
kubectl get nodes -o wide

# Full node detail (conditions, capacity, allocated resources, taints)
kubectl describe node <node-name>

# Show all taints across nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Show all node labels
kubectl get nodes --show-labels

# Filter nodes by label
kubectl get nodes -l nodepool=gpu

# Resource usage per node
kubectl top nodes

# Resource usage per pod
kubectl top pods -n <namespace>
kubectl top pods -A --sort-by=memory

# Check allocated resources vs capacity on a node
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

# Cordon a node (stop new pods scheduling on it)
kubectl cordon <node-name>

# Uncordon a node
kubectl uncordon <node-name>

# Drain a node (evict all pods, cordon node)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

📋 Events

Events are the fastest way to understand what Kubernetes is doing or why something failed. Always check events before diving into logs.

# All events in a namespace, sorted by time (most recent last)
kubectl get events -n <namespace> --sort-by=.lastTimestamp

# All events across the cluster
kubectl get events -A --sort-by='.lastTimestamp' | tail -30

# Watch events in real-time
kubectl get events -n <namespace> -w

# Watch only Pod-related events
kubectl get events -n <namespace> -w --field-selector involvedObject.kind=Pod

# Warning events only (skip normal)
kubectl get events -A --field-selector type=Warning

# Events for a specific object
kubectl get events -n <namespace> \
  --field-selector involvedObject.name=<pod-or-deployment-name>

# Events with wide output (shows source component)
kubectl get events -n <namespace> -o wide --sort-by=.lastTimestamp

Real-world pattern: When a deployment is not rolling out, run kubectl get events -n <ns> -w --field-selector involvedObject.kind=Pod in one terminal and kubectl get pods -n <ns> -w in another. You'll see exactly which filter is blocking each pod in real-time.

🌐 Networking & Services

# List services
kubectl get svc -n <namespace>
kubectl get svc -A

# Describe a service (check endpoints, port mappings)
kubectl describe svc <service-name> -n <namespace>

# Check endpoints — if empty, selector doesn't match any pod labels
kubectl get endpoints <service-name> -n <namespace>

# Port-forward a service to localhost (useful for testing without ingress)
kubectl port-forward service/<service-name> 8080:80 -n <namespace>

# Port-forward a specific pod
kubectl port-forward pod/<pod-name> 8080:8080 -n <namespace>

# List ingresses
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>

# DNS check from inside cluster (run in debug pod)
kubectl run dns-test --image=busybox --restart=Never -it --rm -- \
  nslookup <service-name>.<namespace>.svc.cluster.local

# Check network policies
kubectl get networkpolicy -n <namespace>
kubectl describe networkpolicy <name> -n <namespace>

🔒 Secrets & Certificates

# List secrets
kubectl get secrets -n <namespace>

# Decode a secret value (base64)
kubectl get secret <secret-name> -n <namespace> \
  -o jsonpath="{.data.<key>}" | base64 -d && echo

# Get ArgoCD initial admin password
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d && echo

# View all keys in a secret (without values)
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data}' | \
  python3 -c "import sys,json; [print(k) for k in json.load(sys.stdin)]"

# Create a TLS secret from cert and key files
kubectl create secret tls <secret-name> \
  --cert=tls.crt \
  --key=tls.key \
  -n <namespace>

# Create a generic secret
kubectl create secret generic <secret-name> \
  --from-literal=username=admin \
  --from-literal=password=mysecret \
  -n <namespace>

cert-manager Certificates

# List all certificates
kubectl get certificate -A

# Check if a certificate is Ready
kubectl get certificate -n <namespace>
# Ready column should show True

# Describe a certificate (shows renewal time, issuer, events)
kubectl describe certificate <cert-name> -n <namespace>

# Check CertificateRequests (shows signing history)
kubectl get certificaterequest -n <namespace>

# Check cert-manager pods are running
kubectl get pods -n cert-manager

# Check Issuer or ClusterIssuer status
kubectl get clusterissuer
kubectl describe clusterissuer <issuer-name>

# Check Linkerd identity issuer certificate
kubectl get certificate -n linkerd
kubectl describe certificate linkerd-identity-issuer -n linkerd

🔍 JSONPath & Output Formatting

JSONPath lets you extract exactly the field you need without parsing full YAML output.

# Get pod's node assignment
kubectl get pod <pod-name> -n <namespace> \
  -o jsonpath='{.spec.nodeName}'

# Get all pod names in a namespace
kubectl get pods -n <namespace> \
  -o jsonpath='{.items[*].metadata.name}'

# Get image used by each container in a deployment
kubectl get deployment <name> -n <namespace> \
  -o jsonpath='{.spec.template.spec.containers[*].image}'

# Get namespace labels (e.g. verify compute class label on GKE)
kubectl get namespace <namespace> \
  -o jsonpath='{.metadata.labels}'

# Get nodeSelector injected into a pod (GKE autopilot check)
kubectl get pod <pod-name> -n <namespace> \
  -o yaml | grep -A 1 nodeSelector

# Check init container restart policies (Linkerd native sidecar check)
kubectl get pod -n <namespace> \
  -o jsonpath='{range .items[0].spec.initContainers[*]}{.name}: restartPolicy={.restartPolicy}{"\n"}{end}'

# Get all container images running across the cluster
kubectl get pods -A \
  -o jsonpath='{range .items[*]}{.metadata.namespace}{" "}{.metadata.name}{" "}{range .spec.containers[*]}{.image}{"\n"}{end}{end}'

# Custom columns output
kubectl get pods -n <namespace> \
  -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,NODE:.spec.nodeName,IP:.status.podIP

Quick tip: Use -o yaml first to explore the full object structure, then build your JSONPath expression from the field paths you find.

⚙️ ArgoCD

ArgoCD Commands for managing GitOps applications via both kubectl and the argocd CLI.

Via kubectl

# Get all ArgoCD applications
kubectl -n argocd get applications

# Get ApplicationSets
kubectl -n argocd get applicationsets

# Describe an application (shows sync status, health, conditions)
kubectl -n argocd describe application <app-name>

# Check ArgoCD pods are healthy
kubectl get pods -n argocd

# Get initial admin password (first login)
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d && echo

# Port-forward ArgoCD UI to localhost
kubectl port-forward service/argocd-server -n argocd 8080:443

# Apply root App-of-Apps (cluster bootstrap)
kubectl apply -f bootstrap/<cluster>-root-app.yaml

# Watch application sync in real-time
kubectl -n argocd get applications -w

Via argocd CLI

# Login (after port-forward or via ingress)
argocd login localhost:8080 --insecure

# List all apps and their sync/health status
argocd app list

# Sync an application manually
argocd app sync <app-name>

# Force sync (ignores cache, re-applies all resources)
argocd app sync <app-name> --force

# Check application diff (what would change on next sync)
argocd app diff <app-name>

# Get application details
argocd app get <app-name>

# Hard refresh (re-fetches from Git)
argocd app get <app-name> --hard-refresh

Sync vs Health: An application can be Synced but Degraded — meaning ArgoCD applied all manifests but the workload itself is not healthy (e.g. pods in CrashLoopBackOff). Always check both columns in argocd app list.

🔗 Linkerd Service Mesh

Linkerd Commands for verifying injection, checking certificate health, and debugging mesh traffic.

# Check Linkerd control plane pods are running
kubectl get pods -n linkerd

# Verify Linkerd identity issuer certificate is Ready
kubectl get certificate -n linkerd
kubectl describe certificate linkerd-identity-issuer -n linkerd

# Verify issuer secret exists
kubectl get secret linkerd-identity-issuer -n linkerd

# Check if a namespace is meshed (inject annotation present)
kubectl get namespace <namespace> -o jsonpath='{.metadata.annotations}'

# Mesh a namespace — all new pods will get the Linkerd proxy injected
kubectl annotate namespace <namespace> linkerd.io/inject=enabled

# Un-mesh a namespace
kubectl annotate namespace <namespace> linkerd.io/inject=disabled --overwrite

# Check if a pod has the Linkerd proxy sidecar injected
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].name}'
# Should show: linkerd-proxy alongside your app container

# Verify native sidecar injection (K8s 1.29+ — restartPolicy=Always on init container)
kubectl get pod -n <namespace> \
  -o jsonpath='{range .items[0].spec.initContainers[*]}{.name}: restartPolicy={.restartPolicy}{"\n"}{end}'
# Expected: linkerd-proxy: restartPolicy=Always

Via linkerd CLI

# Check Linkerd installation health
linkerd check

# Check data plane (proxy) health in a namespace
linkerd check --proxy -n <namespace>

# Live traffic stats for a deployment
linkerd viz stat deploy -n <namespace>

# Real-time request tap (sample live traffic)
linkerd viz tap deploy/<name> -n <namespace>

# Top routes by request volume
linkerd viz top deploy/<name> -n <namespace>

☁️ GKE-Specific

GKE Commands and patterns specific to Google Kubernetes Engine, including Autopilot compute class management and workload identity.

Autopilot — Compute Class & Billing

# Set default compute class for a namespace (on-demand)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class=autopilot --overwrite

# Set default compute class for a namespace (spot VMs — cheapest)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class=autopilot-spot --overwrite

# Verify the label was applied
kubectl get namespace <namespace> \
  -o jsonpath='{.metadata.labels}'

# Verify nodeSelector was injected into a pod by GKE Autopilot
kubectl get pod <pod-name> -n <namespace> \
  -o yaml | grep -A 1 nodeSelector

# Remove compute class label (revert to default)
kubectl label namespace <namespace> \
  cloud.google.com/default-compute-class-

Node Pool & Cluster

# Get GKE cluster credentials (configure kubectl context)
gcloud container clusters get-credentials <cluster-name> \
  --region <region> --project <project-id>

# List node pools and their labels
kubectl get nodes -l cloud.google.com/gke-nodepool=<pool-name>

# Target a specific node pool via nodeSelector
# (in pod spec)
nodeSelector:
  cloud.google.com/gke-nodepool: <pool-name>

# Check Workload Identity annotation on a K8s service account
kubectl get serviceaccount <ksa-name> -n <namespace> \
  -o jsonpath='{.metadata.annotations}'

# List Google CAS ClusterIssuers (cert-manager + Google CAS integration)
kubectl get googlecasclusterissuer

# Check Google CAS issuer status
kubectl describe googlecasclusterissuer <issuer-name>

GKE Autopilot tip: When using Autopilot, label your namespaces with autopilot-spot for batch/non-critical workloads to significantly reduce costs. Use autopilot (on-demand) for production services that cannot tolerate spot preemption.

Key Takeaways

Start with events, not logs. kubectl get events --sort-by=.lastTimestamp tells you what Kubernetes has been doing. Logs tell you what your app has been doing. For infrastructure issues, events come first.
kubectl describe is your best friend. It aggregates spec, status, and events in one view. For any resource that isn't behaving, describe it before going anywhere else.
Use --previous for crashed containers. If a container is in CrashLoopBackOff, kubectl logs --previous shows the logs from the last failed run — the current container may not have enough uptime to log anything useful.
Empty endpoints = broken service. If a Service isn't routing traffic, run kubectl get endpoints <svc>. An empty endpoints list means the selector doesn't match any pod labels.
Port-forward for quick testing. Before debugging ingress or DNS, use kubectl port-forward to confirm the service itself works. Eliminates the network layer as a variable.
ArgoCD Synced ≠ Healthy. Always check both the Sync and Health columns. A Synced but Degraded app means manifests were applied but pods aren't running correctly.
JSONPath over grep for automation. Use -o jsonpath in scripts for reliable field extraction instead of parsing human-readable output with grep/awk.