Docker 📅 2026-02-02

Docker Container Exits with Code 137: Causes and Fixes

When a Docker container abruptly terminates with exit code 137, it signals a critical resource management issue, almost invariably pointing to an Out-Of-Memory (OOM) condition or an explicit SIGKILL signal. As SREs, diagnosing and rectifying this error is paramount for maintaining service stability and preventing cascading failures in containerized deployments. This article provides a definitive guide to understanding, diagnosing, and resolving exit code 137, ensuring robust container operations.

🚨 Symptoms & Diagnosis¶

An exit code 137 (derived from 128 + 9 where 9 is the SIGKILL signal) indicates that a process within the container was terminated by an external agent, typically the Linux kernel's Out-Of-Memory (OOM) killer or a docker stop command. Identifying the exact trigger is critical for effective remediation.

Common error signatures include:

command terminated with exit code 137

docker exited with code 137

Exit Code 137 in Kubernetes

"OOMKilled": true

Exit code 137 (128 + 9 SIGKILL)

"ExitCode": 137

Container was killed (OOM or docker stop)

Root Cause: The primary cause is an Out-Of-Memory (OOM) condition where the container's memory consumption exceeds its allocated limit, prompting the Linux kernel's OOM killer to terminate the process via SIGKILL. Other factors include overly restrictive resource limits, memory leaks, sudden memory spikes, and improper signal handling during container shutdown.

🛠️ Solutions¶

Addressing exit code 137 requires a systematic approach, ranging from immediate mitigations for production incidents to strategic optimizations for long-term stability.

Immediate Diagnosis: Check Container Exit Status¶

Rapidly identify if the OOMKilled flag is set and retrieve container logs to confirm memory-related errors. This initial diagnosis is crucial for understanding the immediate context of the termination.

Inspect the container's state to check the OOMKilled flag and exit code.
Retrieve container logs to identify any memory-related error messages or warnings leading up to the termination.
Check the host system's dmesg output for kernel OOM killer invocations specific to the container's processes.
If available, verify the container's memory usage metrics at the time of failure.

docker inspect <container_id> | grep -A 5 '"State"'
docker logs <container_id> | tail -50
docker inspect <container_id> --format='{{json .State}}' | jq '.OOMKilled, .ExitCode'
sudo dmesg | grep -i 'oom\|killed' | tail -20

!!! tip "Immediate Mitigation: Increase Container Memory Limit"¶

Immediately raise the memory allocation for the container to prevent further OOMKilled terminations. This is a temporary solution for production outages and provides headroom for deeper analysis.

Stop the affected container.
Update the docker-compose.yml file or Kubernetes manifest to increase the mem_limit (Docker) or limits.memory (Kubernetes).
Restart the container with the new resource limits.
Monitor the container's memory usage and application performance to confirm stability.

Docker ComposeKubernetes

docker stop <container_id>
# Example docker-compose.yml update:
# Stop, then edit docker-compose.yml:
# services:
#   app:
#     mem_limit: 2g
#     memswap_limit: 2g
# Then redeploy:
docker-compose up -d

# For a running pod (temporary, often overridden by deployment spec):
kubectl set resources pod <pod_name> -c <container_name> --limits=memory=2Gi

# For a deployment (best practice for persistent change):
# Edit your deployment manifest (e.g., via `kubectl edit deployment/<deployment_name>`)
# and update the 'limits.memory' field under spec.template.spec.containers[].resources
kubectl rollout restart deployment/<deployment_name>

!!! success "Best Practice Fix: Optimize Application Memory Usage"¶

Identify and eliminate memory leaks within the application code itself. This is the most robust, long-term solution for production stability and resource efficiency.

Profile application memory usage using language-specific tools to pinpoint excessive consumption or growth patterns.
Review the codebase for common memory leak patterns, such as unclosed database connections, unreleased file handles, unmanaged object caches, or unbounded data structures (arrays, maps).
Implement proper resource cleanup mechanisms, utilizing finally blocks, try-with-resources (Java), context managers (Python), or equivalent patterns.
Integrate granular memory monitoring and alerting for your application to detect regressions or new leaks early.
Thoroughly test the optimized application under various load conditions before deploying to production.

Python Memory ProfilingNode.js Heap SnapshotJava Heap DumpDocker & Kubernetes Monitoring

pip install memory-profiler
python -m memory_profiler app.py

node --inspect app.js
# Then open Chrome DevTools and navigate to chrome://inspect to connect and take heap snapshots.

jmap -dump:live,format=b,file=heap.bin <pid>
jhat heap.bin # Analyze the heap dump

# Docker container stats:
docker stats <container_id> --no-stream

# Kubernetes pod memory usage:
kubectl top pod <pod_name>
kubectl describe node <node_name> | grep -A 5 'Allocated resources'

!!! success "Best Practice Fix: Configure Proper Resource Requests and Limits"¶

Set realistic memory requests and limits based on observed application behavior. This prevents OOMKilled events by providing adequate resources and ensures efficient scheduling of pods/containers by the orchestrator.

Profile the application under expected peak load to establish a baseline and maximum memory requirement.
Set requests.memory to approximately 80% of the baseline average memory usage.
Set limits.memory to 120-150% of the peak observed memory usage, providing a buffer without over-provisioning.
Apply these resource definitions to your docker-compose.yml or Kubernetes deployment manifests.
Continuously monitor actual resource utilization and adjust limits iteratively as application behavior evolves.

Docker ComposeKubernetes Deployment

# docker-compose.yml example:
services:
  app:
    image: your-app-image:latest
    mem_limit: 1g             # Hard limit
    memswap_limit: 1g         # Total memory + swap limit
    environment:
      - JAVA_OPTS=-Xmx800m -Xms512m # Internal JVM limits

# Kubernetes deployment example:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: your-app-image:latest
        resources:
          requests:
            memory: "512Mi"   # Guarantees this much memory
          limits:
            memory: "1Gi"     # Container will be killed if it exceeds this

Fix: Enable Swap and Configure OOM Behavior¶

While generally not recommended for performance-critical containers, enabling swap on the host and configuring container swap limits can provide a memory overflow buffer. Adjusting kernel vm.overcommit_memory parameters can also influence OOM killer behavior.

Host System Modification & Data Loss Warning

Enabling or modifying swap space directly on the host system can impact overall system performance and stability. Incorrect configuration can lead to system freezes or data loss. Proceed with caution and ensure proper backups. For Kubernetes environments, configuring swap is typically managed at the node level, and its effects on pods can be complex.

Enable swap on the host system if it's not already configured.
Configure a memswap_limit for your container in docker-compose.yml or via docker run flags. Note that Kubernetes generally disables swap for containers by default.
Adjust the vm.overcommit_memory kernel parameter on the host. vm.overcommit_memory=1 allows the kernel to overcommit memory, potentially delaying OOM, but risks system instability.
For extremely critical containers, the oom_kill_disable flag can be set, but this risks system instability if that container consumes all available memory. This requires SYS_RESOURCE capability.

Host Swap Configuration (Linux)Docker Compose Swap LimitsKubernetes OOM Kill Disable (Advanced)Kernel Tuning (Host Level)

# Create a 4GB swap file:
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# To make swap persistent across reboots, add to /etc/fstab:
# /swapfile none swap sw 0 0

# docker-compose.yml example:
services:
  app:
    image: your-app-image:latest
    mem_limit: 1g         # Container memory limit
    memswap_limit: 2g     # Total memory (1g) + swap (1g) limit

# Kubernetes pod example with oom_kill_disable (requires SYS_RESOURCE)
apiVersion: v1
kind: Pod
metadata:
  name: app-critical
spec:
  containers:
  - name: app
    image: your-app-image:latest
    securityContext:
      capabilities:
        add:
        - SYS_RESOURCE # Required for oom_kill_disable
    # Note: oom_kill_disable is not directly exposed in Kubernetes resource limits.
    # It needs to be configured at the container runtime level (e.g., containerd/cgroup settings)
    # and is generally NOT recommended due to the risk of starving the host.
    # This example is conceptual; practical implementation is complex and often avoided.

# Allow kernel to overcommit memory (risky):
sudo sysctl -w vm.overcommit_memory=1
# Adjust ratio for overcommit (e.g., 50% physical RAM):
sudo sysctl -w vm.overcommit_ratio=50

Fix: Correct Entrypoint Signal Handling¶

An exit code 137 can also occur if docker stop is issued and the container's main process does not properly handle SIGTERM (signal 15) before the default 10-second grace period expires, leading to a forceful SIGKILL (signal 9).

Verify that your Dockerfile's ENTRYPOINT or CMD uses exec to ensure the main application process receives signals directly, rather than a shell wrapper.
Ensure your application runs in the foreground, not as a background daemon, allowing it to be the PID 1 process that receives signals.
Implement robust signal handlers within your application to perform graceful shutdowns (e.g., flush logs, close connections) when SIGTERM is received.
Test the graceful shutdown process by running docker stop <container_id>.

# Correct Dockerfile entrypoint example for Nginx:
FROM nginx:latest
COPY entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"] # Nginx runs in foreground

# entrypoint.sh example (ensure 'exec' is used):
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status

# Add pre-start logic here if needed

# Crucial: Use 'exec' to replace the shell process with the application process.
# This ensures signals sent to the container PID 1 (which is usually the shell)
# are passed directly to the application.
exec nginx -g 'daemon off;'

# Test graceful shutdown:
docker run -d --name test-nginx nginx:latest
sleep 2 # Give container time to start
docker stop test-nginx # This sends SIGTERM, then SIGKILL after grace period
docker inspect test-nginx | grep ExitCode
# Expected ExitCode: 0 if graceful shutdown, 137 if SIGKILL after timeout

Fix: Implement Health Checks and Monitoring¶

Proactive health checks and robust monitoring can detect memory issues before they escalate to an OOMKilled event, enabling automated remediation or timely human intervention.

Add liveness and readiness probes to your docker-compose.yml or Kubernetes deployment to check application health, potentially including memory indicators.
Configure memory threshold alerts (e.g., 80% of limits.memory) in your monitoring system (Prometheus, Grafana, Datadog) to warn of impending OOM conditions.
Implement graceful degradation or auto-restart logic for services experiencing resource contention.
Set up centralized logging for application and system memory metrics to aid in post-mortem analysis.

Docker Compose HealthcheckKubernetes Liveness/Readiness Probes

# docker-compose.yml example:
services:
  app:
    image: your-app-image:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s # Give app time to start before checking

# Kubernetes deployment example:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: your-app-image:latest
        livenessProbe:
          httpGet:
            path: /health # Endpoint that indicates if app is running
            port: 8080
          initialDelaySeconds: 30 # Time before first check
          periodSeconds: 10 # How often to check
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready # Endpoint that indicates if app is ready to serve traffic
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 1

🧩 Technical Context (Visualized)¶

The Linux kernel's Out-Of-Memory (OOM) killer is a critical component of its memory management system, designed to prevent system collapse when memory is exhausted. When a process (or in our case, a container) attempts to allocate memory beyond its cgroup limit (set by Docker/Kubernetes), the kernel steps in, identifying and terminating a "guilty" process to free up resources. This termination is enforced via a SIGKILL (signal 9), leading directly to the exit code 137 (128 + 9).

graph TD
    A[Container Process Requests Memory] --> B{Memory Usage Exceeds Limit?};
    B -- No --> C[Process Continues Running];
    B -- Yes --> D{Linux Kernel OOM Killer Invoked};
    D --> E[OOM Killer Selects & Terminates "Guilty" Process];
    E --> F[Sends SIGKILL Signal (9)];
    F --> G[Container Receives SIGKILL];
    G --> H[Container Process Exits];
    H --> I[Container Runtime Reports Exit Code 137];
    I -- OR --> J[Container Runtime Reports OOMKilled: true];

✅ Verification¶

After implementing any of the solutions, verify the container's stability and exit status.

Check the container's state and OOMKilled status.
Review logs for any memory-related warnings.
Monitor real-time memory usage.
For Kubernetes, examine pod descriptions and previous logs.
Check host-level dmesg or journalctl for OOM killer activity.

docker inspect <container_id> --format='{{json .State}}' | jq '.OOMKilled, .ExitCode, .Error'
docker logs <container_id> | grep -i 'memory\|oom\|killed'
docker stats <container_id> --no-stream
kubectl describe pod <pod_name> | grep -A 10 'State\|Last State'
kubectl logs <pod_name> --previous
sudo journalctl -u docker -n 50 | grep -i 'oom\|killed'
docker run --rm <image_id> /bin/sh -c 'free -h && ps aux'
kubectl top nodes
kubectl top pods -A --sort-by=memory

📦 Prerequisites¶

To effectively diagnose and resolve exit code 137, ensure you have the following tools and versions:

Container Runtimes: Docker 18.9+ or Kubernetes 1.14+
Operating System: Linux kernel 4.15+ (for robust cgroup v2 and OOM killer support)
CLI Tools: curl or wget for health checks, jq for JSON parsing (highly recommended)
Permissions: sudo access on the host for kernel tuning and dmesg/journalctl access
Orchestration: docker-compose 1.25+ or kubectl 1.20+
Profiling Tools: Language-specific memory profiling tools (e.g., memory-profiler for Python, Chrome DevTools for Node.js, jmap for Java)