Kubernetes 📅 2026-02-07

Kubernetes Exit Code 137: Resolving OOMKilled Pod Eviction and Memory Limit Errors

When a container within a Kubernetes pod terminates with Exit Code 137, it's a critical signal indicating that the container was forcibly shut down by the Linux kernel's Out Of Memory (OOM) killer. This often leads to pod eviction and service instability. Understanding and resolving this error is crucial for maintaining a stable and performant Kubernetes cluster.

🚨 Symptoms & Diagnosis¶

Identifying an Exit Code 137 event requires inspecting pod statuses and logs. Key indicators include:

Pod status: OOMKilled
Exit Code: 137 (derived from 128 + signal_number, where signal_number is 9 for SIGKILL)
Underlying signal: SIGKILL (signal 9)
Log entry: 'Out of memory for code recipe on Kubernetes' (specific application logs may vary)
kubectl describe pod output showing: 'Last State: Terminated (OOMKilled)'
Container state reason: OOMKilled with exit code 137

# Example kubectl describe pod output snippet
State:          Waiting
  Reason:       CrashLoopBackOff
Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137
  Started:      Mon, 29 Jan 2024 10:30:00 -0500
  Finished:     Mon, 29 Jan 2024 10:30:15 -0500

Root Cause: The Exit Code 137 primarily signifies that a container consumed more memory than its resources.limits.memory specification or that the node itself experienced memory exhaustion, prompting the Linux kernel's OOM killer to forcefully terminate the process via a SIGKILL signal. This can be caused by misconfigured limits, application memory leaks, or insufficient node resources.

🛠️ Solutions¶

Resolving Exit Code 137 involves a systematic approach, from immediate diagnosis to long-term resource management and application optimization.

Immediate Diagnosis & Log Inspection¶

Quickly identify Exit Code 137 and gather diagnostic context from pod logs and Kubernetes events.

Confirm Exit Code 137 on the affected pod.
Retrieve pod description to view the termination reason.
Check Kubernetes cluster events for OOMKilled entries.
Inspect pod logs for memory-related errors or warnings preceding termination.

# Confirm pod status and termination reason
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Review logs, specifically previous container logs after a restart
kubectl logs <pod-name> -n <namespace> --previous
kubectl logs <pod-name> -n <namespace> --tail=100

Quick Fix: Increase Memory Limit (Temporary Scaling)¶

Immediately increase the memory limit for the affected deployment or pod to stop OOMKilled errors, serving as a temporary mitigation while you investigate the root cause.

Immediate Mitigation: Increase Memory Limit (Temporary Scaling)

This approach provides immediate relief by giving the struggling container more resources. However, it's crucial to follow up with proper resource configuration and application profiling to avoid simply moving the problem elsewhere or over-provisioning resources.

Edit the deployment resource specification.
Locate resources.limits.memory field.
Increase the memory value (e.g., from 512Mi to 1Gi).
Apply the change and monitor pod restart.

# Example: Temporarily increase memory limit for a deployment
kubectl set resources deployment <deployment-name> -n <namespace> --limits=memory=1Gi

# Monitor the rollout status and new pods
kubectl rollout status deployment/<deployment-name> -n <namespace>
kubectl get pods -n <namespace> -w

Permanent Fix: Configure Memory Requests & Limits¶

Establish proper memory requests and limits in the pod specification to prevent OOMKilled errors and enable accurate scheduler placement.

Best Practice Fix: Configure Memory Requests & Limits

Setting appropriate memory requests and limits is fundamental for stable Kubernetes operations. Requests ensure adequate scheduling, and limits prevent resource monopolization and OOM evictions.

Edit the deployment YAML manifest.
Add resources.requests.memory (scheduler reservation) and resources.limits.memory (hard limit).
Set requests to 70-80% of expected peak usage; set limits 20-30% higher than requests to allow for burst capacity without over-provisioning.
Apply the manifest and verify pod scheduling and resource allocation.

# Example: Adding memory requests and limits to a container in a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
  namespace: production
spec:
  template:
    spec:
      containers:
      - name: my-app-container
        image: my-repo/my-app:latest
        resources:
          requests:
            memory: "512Mi" # Guarantee this amount for scheduling
          limits:
            memory: "1Gi"  # Hard limit, OOMKilled if exceeded

# Apply the updated deployment manifest
kubectl apply -f deployment.yaml

# Verify the new resource limits and requests are applied
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 'Limits\|Requests'

Detect Memory Leaks & Optimize Application¶

Identify memory leaks in the application that cause gradual memory consumption growth, eventually leading to OOMKilled.

Monitor memory usage trends over time using tools like Prometheus or other APM solutions.
Analyze application logs for memory allocation patterns or warnings indicating increasing memory footprint.
Profile the application with language-specific memory debugging tools (e.g., pprof for Go, jmap for Java, Valgrind for C/C++).
Fix the memory leak in the application code or upgrade to a patched version if using third-party software.

# Get real-time memory usage of a pod's containers
kubectl top pod <pod-name> -n <namespace> --containers

# Access the pod to perform in-container memory diagnostics
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh

# Inside the pod: check memory usage
free -h
ps aux --sort=-%mem

# For Java applications (inside the pod):
jmap -heap <pid-of-java-process>

# For Go applications with pprof enabled (ensure pprof is imported):
# In your Go app: import _ "net/http/pprof"
# Then access from outside: http://localhost:6060/debug/pprof/heap

Configure Horizontal Pod Autoscaling (HPA)¶

Scale the number of pod replicas based on memory demand to distribute load and prevent individual pod OOMKilled events due to fluctuating load.

Ensure metrics-server is deployed in your cluster (it's required for HPA to get resource metrics).
Create an HPA resource targeting memory utilization for your deployment.
Set a target memory percentage (e.g., 70%) and define minReplicas and maxReplicas.
Monitor HPA scaling behavior to ensure it responds effectively to memory pressure.

# Example: Horizontal Pod Autoscaler for memory utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-application # Target your deployment here
  minReplicas: 2 # Minimum number of pods
  maxReplicas: 10 # Maximum number of pods
  metrics:
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70 # Target 70% average memory utilization

# Apply the HPA manifest
kubectl apply -f - <<EOF
# (Paste the HPA YAML from above here)
EOF

# Monitor the HPA status and scaling events
kubectl get hpa -n <namespace> -w

Verify Restart Policy & Node Eviction Handling¶

Ensure pod restart policy is configured correctly and monitor for node-level memory pressure which can also cause pod eviction, not just container-level OOM.

Check the pod's restartPolicy (Always, OnFailure, Never) to ensure it aligns with desired behavior after termination.
For node-level OOM issues, inspect node memory pressure conditions and events.
Consider configuring Pod Disruption Budgets (PDBs) if you need to ensure a minimum number of healthy pods during voluntary disruptions.
Regularly monitor node memory usage to prevent cluster-wide OOM scenarios.

# Check the restart policy of a specific pod
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 3 'restartPolicy'

# Inspect node conditions for memory pressure
kubectl describe node <node-name> | grep -A 10 'Conditions'

# Get current memory usage for all nodes
kubectl top nodes

# Check a specific node for memory pressure events
kubectl describe node <node-name> | grep -i 'memory\|pressure'

🧩 Technical Context (Visualized)¶

Exit Code 137 is a direct result of the Linux kernel's Out Of Memory (OOM) killer terminating a process. When a container's memory consumption exceeds its defined memory.limits within a Kubernetes pod, the underlying cgroup mechanism signals the Linux kernel. The kernel then invokes its OOM killer, which sends a SIGKILL (signal 9) to the offending process, resulting in an exit code of 128 + 9 = 137.

graph TD
    A[Kubernetes Container Process] --> B{Memory Usage Exceeds limit.memory};
    B -- triggers --> C[Linux Kernel cgroup monitoring];
    C -- exceeds configured limit --> D{Linux OOM Killer Activated};
    D -- sends --> E["SIGKILL (signal 9)"];
    E -- terminates --> A;
    A -- reports --> F["Pod Status: OOMKilled"];
    F -- with --> G["Exit Code: 137"];

✅ Verification¶

After implementing any of the solutions, verify that Exit Code 137 errors are no longer occurring and that your pods are running stably.

Check Pod Status: Confirm the pod is running and healthy.
Inspect Last State: Ensure the container's last state is not Terminated (OOMKilled).
Review Recent Logs: Check for any new OOM-related messages.
Monitor Resource Usage: Verify memory consumption is within limits.
Check Events: Look for OOMKilled or eviction events.

# Check the pod's current and last known state
kubectl describe pod <pod-name> -n <namespace> | grep -E 'State|Reason|Exit Code|OOMKilled'

# Get the last terminated state in JSONPath (useful for scripting)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'

# Review the most recent logs for any indication of the error
kubectl logs <pod-name> -n <namespace> --previous | tail -50

# Monitor current memory usage to ensure it's stable
kubectl top pod <pod-name> -n <namespace> --containers

# Check for any new OOM-related events
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

📦 Prerequisites¶

To effectively diagnose and resolve Exit Code 137 errors, ensure you have the following:

kubectl CLI (v1.20+) configured with access to your Kubernetes cluster.
kubeconfig properly set up for target cluster access.
metrics-server deployed in your cluster for kubectl top and Horizontal Pod Autoscaling to function.
Sufficient RBAC permissions to describe pods, view logs, and edit deployments/HPAs.
(Optional) Prometheus/Grafana or other APM tools for comprehensive memory trend analysis and alerting.