Kubernetes 📅 2026-02-02

Kubernetes Exit Code 137: How to Resolve OOM Issues

🚨 Symptoms & Diagnosis¶

When a Kubernetes pod terminates with Exit Code 137, it's a clear indicator that the pod was subjected to the Linux kernel's Out Of Memory (OOM) killer. This often means the container exceeded its allocated memory limit and was forcefully terminated to protect the host node.

Observe the following error signatures in your cluster logs or pod descriptions:

Last State: Reason: OOMKilled, Exit Code: 137

Exit Code: 137

kubectl describe pod/my-app-abc123
State: Running -> Terminated
Reason: OOMKilled
Exit Code: 137

kubectl get events
Warning FailedScheduling pod/my-app OOMKilled

dmesg | grep -i 'killed process'
Out of memory: Killed process 12345 (my-app) total-vm:1048576kB, anon-rss:524288kB

Root Cause: Kubernetes containers exceeding their defined memory limits trigger the underlying Linux kernel's OOM killer. This results in a SIGKILL (signal 9) being sent to the container process, which Kubernetes reports as Exit Code 137.

🛠️ Solutions¶

Resolving Exit Code 137 involves both immediate mitigation strategies to restore service and long-term best practices to prevent recurrence.

Increase Container Memory Limits¶

Immediate Mitigation: Scale and Adjust Memory

This quick fix involves identifying and restarting affected pods after increasing their memory limits. This provides immediate relief but should be followed by a proper analysis.

Identify affected pods: Use kubectl get pods to find pods that are repeatedly restarting or showing an OOMKilled state.
```
kubectl get pods -n default | grep OOMKilled
```
Edit the deployment to increase memory limits: Access the deployment specification and adjust the resources.limits.memory and resources.requests.memory for the affected container.
```
kubectl edit deployment my-app -n default
```
Locate the spec.template.spec.containers section and modify the resources block:
```
# Add/increase under spec.template.spec.containers.resources:
resources:
  limits:
    memory: "1Gi"
  requests:
    memory: "512Mi"
```
Force pod restart: Deleting the old OOMKilled pod will trigger the deployment controller to create a new one with the updated resource limits.

Caution: Deleting Pods

Deleting a pod will momentarily interrupt service for that specific instance. Ensure you have sufficient replicas or a graceful shutdown mechanism if this is a production environment.
```
kubectl delete pod my-app-abc123 -n default
```
Monitor events: Observe the new pod's status and events to confirm it starts successfully and doesn't get OOMKilled again.

Implement Comprehensive Resource Management¶

Best Practice Fix: Resource Requests, Limits, and Automation

For long-term stability and efficient resource utilization, a robust resource management strategy is essential. This includes defining appropriate resource requests and limits, leveraging tools like the Vertical Pod Autoscaler (VPA), and setting namespace-level quotas.

Analyze historical usage: Understand the actual memory consumption patterns of your applications.
```
kubectl top pods -n default
kubectl top nodes
```
For more granular historical data, consider integrating Prometheus and Grafana.

Update all deployments with appropriate requests and limits: Based on your analysis, set requests to the minimum required memory for the application to start and run effectively, and limits to the absolute maximum it should ever consume. A common best practice is to set requests.memory lower than limits.memory to allow for burst capacity, but ensure limits.memory is still enforced.

# Example deployment.yaml with proper resources
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  selector:
    matchLabels:
      app: my-app
  replicas: 3
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: app
        image: myapp:1.0
        resources:
          requests:
            memory: "256Mi" # Guaranteed minimum
            cpu: "100m"
          limits:
            memory: "512Mi" # Hard ceiling, prevents OOMKilled
            cpu: "500m"
        livenessProbe:
          exec:
            command: ['/bin/sh', '-c', 'ps aux | wc -l']
          initialDelaySeconds: 30
          timeoutSeconds: 5

Apply this configuration:

kubectl apply -f deployment.yaml

Install Prometheus + kube-state-metrics for comprehensive monitoring: These tools provide metrics on pod and node resource usage, helping you identify trends and potential bottlenecks.
Deploy Vertical Pod Autoscaler (VPA): VPA can automatically recommend or apply optimal resource requests and limits for your pods based on historical usage patterns, reducing manual overhead and preventing OOM events.

Set Namespace ResourceQuotas: Enforce memory and CPU constraints at the namespace level to prevent any single team or application from consuming excessive cluster resources.

kubectl apply -f - <<EOF
apiVersion: v1
kind: ResourceQuota
metadata:
  name: mem-cpu-quota
  namespace: default
spec:
  hard:
    requests.memory: 20Gi
    limits.memory: 40Gi
    requests.cpu: 10
    limits.cpu: 20
EOF

🧩 Technical Context (Visualized)¶

Kubernetes orchestrates containers managed by a Container Runtime Interface (CRI). When a container within a pod attempts to consume memory beyond its limits defined in the pod specification, the underlying Linux kernel's Out Of Memory (OOM) killer is invoked. This OOM killer intervenes by sending a SIGKILL (signal 9) to the container's process, forcefully terminating it. Kubernetes then registers this termination as Exit Code 137, signifying an OOMKilled event.

graph TD
    A[Pod Container Running] --> B{Memory Usage > Resource Limit?};
    B -- Yes --> C[Linux Kernel OOM Killer Activates];
    C --> D[Sends SIGKILL (Signal 9)];
    D --> E[Container Process Terminated];
    E --> F[Kubernetes Reports Exit Code 137 (OOMKilled)];
    B -- No --> A;

✅ Verification¶

After implementing solutions, use these commands to verify that your pods are running stably and no longer encountering OOM issues:

# Check specific pod status for OOMKilled or Exit Code 137
kubectl describe pod my-app-newpod -n default | grep -E 'OOMKilled|Exit Code'

# Review cluster events for OOMKilled warnings
kubectl get events --sort-by='.lastTimestamp' | grep OOM

# Monitor current pod resource usage
kubectl top pods -n default

# Monitor current node resource usage
kubectl top nodes

# Continuously observe pod states for any non-Running status
watch 'kubectl get pods -n default | grep -v Running'

📦 Prerequisites¶

To effectively diagnose and resolve Kubernetes Exit Code 137, ensure you have:

kubectl 1.29+: For interacting with your Kubernetes cluster.
Cluster-admin rights: Or equivalent permissions to modify deployments and view events/logs.
metrics-server enabled: Required for kubectl top commands to function.
Linux nodes with dmesg access: For direct kernel OOM logs, typically via SSH.