Kubernetes Exit Code 137: Resolving OOMKilled Pod Eviction and Memory Limit Errors
When a container within a Kubernetes pod terminates with Exit Code 137, it's a critical signal indicating that the container was forcibly shut down by the Linux kernel's Out Of Memory (OOM) killer. This often leads to pod eviction and service instability. Understanding and resolving this error is crucial for maintaining a stable and performant Kubernetes cluster.
🚨 Symptoms & Diagnosis¶
Identifying an Exit Code 137 event requires inspecting pod statuses and logs. Key indicators include:
- Pod status:
OOMKilled - Exit Code:
137(derived from128 + signal_number, wheresignal_numberis9forSIGKILL) - Underlying signal:
SIGKILL (signal 9) - Log entry:
'Out of memory for code recipe on Kubernetes'(specific application logs may vary) kubectl describe podoutput showing:'Last State: Terminated (OOMKilled)'- Container state reason:
OOMKilled with exit code 137
# Example kubectl describe pod output snippet
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 29 Jan 2024 10:30:00 -0500
Finished: Mon, 29 Jan 2024 10:30:15 -0500
Root Cause: The
Exit Code 137primarily signifies that a container consumed more memory than itsresources.limits.memoryspecification or that the node itself experienced memory exhaustion, prompting the Linux kernel's OOM killer to forcefully terminate the process via aSIGKILLsignal. This can be caused by misconfigured limits, application memory leaks, or insufficient node resources.
🛠️ Solutions¶
Resolving Exit Code 137 involves a systematic approach, from immediate diagnosis to long-term resource management and application optimization.
Immediate Diagnosis & Log Inspection¶
Quickly identify Exit Code 137 and gather diagnostic context from pod logs and Kubernetes events.
- Confirm
Exit Code 137on the affected pod. - Retrieve pod description to view the termination reason.
- Check Kubernetes cluster events for
OOMKilledentries. - Inspect pod logs for memory-related errors or warnings preceding termination.
# Confirm pod status and termination reason
kubectl describe pod <pod-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Review logs, specifically previous container logs after a restart
kubectl logs <pod-name> -n <namespace> --previous
kubectl logs <pod-name> -n <namespace> --tail=100
Quick Fix: Increase Memory Limit (Temporary Scaling)¶
Immediately increase the memory limit for the affected deployment or pod to stop OOMKilled errors, serving as a temporary mitigation while you investigate the root cause.
Immediate Mitigation: Increase Memory Limit (Temporary Scaling)
This approach provides immediate relief by giving the struggling container more resources. However, it's crucial to follow up with proper resource configuration and application profiling to avoid simply moving the problem elsewhere or over-provisioning resources.
- Edit the deployment resource specification.
- Locate
resources.limits.memoryfield. - Increase the memory value (e.g., from
512Mito1Gi). - Apply the change and monitor pod restart.
# Example: Temporarily increase memory limit for a deployment
kubectl set resources deployment <deployment-name> -n <namespace> --limits=memory=1Gi
# Monitor the rollout status and new pods
kubectl rollout status deployment/<deployment-name> -n <namespace>
kubectl get pods -n <namespace> -w
Permanent Fix: Configure Memory Requests & Limits¶
Establish proper memory requests and limits in the pod specification to prevent OOMKilled errors and enable accurate scheduler placement.
Best Practice Fix: Configure Memory Requests & Limits
Setting appropriate memory requests and limits is fundamental for stable Kubernetes operations. Requests ensure adequate scheduling, and limits prevent resource monopolization and OOM evictions.
- Edit the deployment YAML manifest.
- Add
resources.requests.memory(scheduler reservation) andresources.limits.memory(hard limit). - Set requests to 70-80% of expected peak usage; set limits 20-30% higher than requests to allow for burst capacity without over-provisioning.
- Apply the manifest and verify pod scheduling and resource allocation.
# Example: Adding memory requests and limits to a container in a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-application
namespace: production
spec:
template:
spec:
containers:
- name: my-app-container
image: my-repo/my-app:latest
resources:
requests:
memory: "512Mi" # Guarantee this amount for scheduling
limits:
memory: "1Gi" # Hard limit, OOMKilled if exceeded
# Apply the updated deployment manifest
kubectl apply -f deployment.yaml
# Verify the new resource limits and requests are applied
kubectl describe pod <pod-name> -n <namespace> | grep -A 5 'Limits\|Requests'
Detect Memory Leaks & Optimize Application¶
Identify memory leaks in the application that cause gradual memory consumption growth, eventually leading to OOMKilled.
- Monitor memory usage trends over time using tools like Prometheus or other APM solutions.
- Analyze application logs for memory allocation patterns or warnings indicating increasing memory footprint.
- Profile the application with language-specific memory debugging tools (e.g.,
pproffor Go,jmapfor Java, Valgrind for C/C++). - Fix the memory leak in the application code or upgrade to a patched version if using third-party software.
# Get real-time memory usage of a pod's containers
kubectl top pod <pod-name> -n <namespace> --containers
# Access the pod to perform in-container memory diagnostics
kubectl exec -it <pod-name> -n <namespace> -- /bin/sh
# Inside the pod: check memory usage
free -h
ps aux --sort=-%mem
# For Java applications (inside the pod):
jmap -heap <pid-of-java-process>
# For Go applications with pprof enabled (ensure pprof is imported):
# In your Go app: import _ "net/http/pprof"
# Then access from outside: http://localhost:6060/debug/pprof/heap
Configure Horizontal Pod Autoscaling (HPA)¶
Scale the number of pod replicas based on memory demand to distribute load and prevent individual pod OOMKilled events due to fluctuating load.
- Ensure
metrics-serveris deployed in your cluster (it's required for HPA to get resource metrics). - Create an HPA resource targeting memory utilization for your deployment.
- Set a target memory percentage (e.g., 70%) and define
minReplicasandmaxReplicas. - Monitor HPA scaling behavior to ensure it responds effectively to memory pressure.
# Example: Horizontal Pod Autoscaler for memory utilization
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-application # Target your deployment here
minReplicas: 2 # Minimum number of pods
maxReplicas: 10 # Maximum number of pods
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70 # Target 70% average memory utilization
# Apply the HPA manifest
kubectl apply -f - <<EOF
# (Paste the HPA YAML from above here)
EOF
# Monitor the HPA status and scaling events
kubectl get hpa -n <namespace> -w
Verify Restart Policy & Node Eviction Handling¶
Ensure pod restart policy is configured correctly and monitor for node-level memory pressure which can also cause pod eviction, not just container-level OOM.
- Check the pod's
restartPolicy(Always,OnFailure,Never) to ensure it aligns with desired behavior after termination. - For node-level OOM issues, inspect node memory pressure conditions and events.
- Consider configuring Pod Disruption Budgets (PDBs) if you need to ensure a minimum number of healthy pods during voluntary disruptions.
- Regularly monitor node memory usage to prevent cluster-wide OOM scenarios.
# Check the restart policy of a specific pod
kubectl get pod <pod-name> -n <namespace> -o yaml | grep -A 3 'restartPolicy'
# Inspect node conditions for memory pressure
kubectl describe node <node-name> | grep -A 10 'Conditions'
# Get current memory usage for all nodes
kubectl top nodes
# Check a specific node for memory pressure events
kubectl describe node <node-name> | grep -i 'memory\|pressure'
🧩 Technical Context (Visualized)¶
Exit Code 137 is a direct result of the Linux kernel's Out Of Memory (OOM) killer terminating a process. When a container's memory consumption exceeds its defined memory.limits within a Kubernetes pod, the underlying cgroup mechanism signals the Linux kernel. The kernel then invokes its OOM killer, which sends a SIGKILL (signal 9) to the offending process, resulting in an exit code of 128 + 9 = 137.
graph TD
A[Kubernetes Container Process] --> B{Memory Usage Exceeds limit.memory};
B -- triggers --> C[Linux Kernel cgroup monitoring];
C -- exceeds configured limit --> D{Linux OOM Killer Activated};
D -- sends --> E["SIGKILL (signal 9)"];
E -- terminates --> A;
A -- reports --> F["Pod Status: OOMKilled"];
F -- with --> G["Exit Code: 137"];
✅ Verification¶
After implementing any of the solutions, verify that Exit Code 137 errors are no longer occurring and that your pods are running stably.
- Check Pod Status: Confirm the pod is running and healthy.
- Inspect Last State: Ensure the container's last state is not
Terminated (OOMKilled). - Review Recent Logs: Check for any new OOM-related messages.
- Monitor Resource Usage: Verify memory consumption is within limits.
- Check Events: Look for
OOMKilledor eviction events.
# Check the pod's current and last known state
kubectl describe pod <pod-name> -n <namespace> | grep -E 'State|Reason|Exit Code|OOMKilled'
# Get the last terminated state in JSONPath (useful for scripting)
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.status.containerStatuses[0].lastState.terminated}'
# Review the most recent logs for any indication of the error
kubectl logs <pod-name> -n <namespace> --previous | tail -50
# Monitor current memory usage to ensure it's stable
kubectl top pod <pod-name> -n <namespace> --containers
# Check for any new OOM-related events
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
📦 Prerequisites¶
To effectively diagnose and resolve Exit Code 137 errors, ensure you have the following:
kubectl CLI(v1.20+) configured with access to your Kubernetes cluster.kubeconfigproperly set up for target cluster access.metrics-serverdeployed in your cluster forkubectl topand Horizontal Pod Autoscaling to function.- Sufficient RBAC permissions to describe pods, view logs, and edit deployments/HPAs.
- (Optional) Prometheus/Grafana or other APM tools for comprehensive memory trend analysis and alerting.