Docker Container Exits with Code 137: Causes and Fixes
When a Docker container abruptly terminates with exit code 137, it signals a critical resource management issue, almost invariably pointing to an Out-Of-Memory (OOM) condition or an explicit SIGKILL signal. As SREs, diagnosing and rectifying this error is paramount for maintaining service stability and preventing cascading failures in containerized deployments. This article provides a definitive guide to understanding, diagnosing, and resolving exit code 137, ensuring robust container operations.
🚨 Symptoms & Diagnosis¶
An exit code 137 (derived from 128 + 9 where 9 is the SIGKILL signal) indicates that a process within the container was terminated by an external agent, typically the Linux kernel's Out-Of-Memory (OOM) killer or a docker stop command. Identifying the exact trigger is critical for effective remediation.
Common error signatures include:
Root Cause: The primary cause is an Out-Of-Memory (OOM) condition where the container's memory consumption exceeds its allocated limit, prompting the Linux kernel's OOM killer to terminate the process via
SIGKILL. Other factors include overly restrictive resource limits, memory leaks, sudden memory spikes, and improper signal handling during container shutdown.
🛠️ Solutions¶
Addressing exit code 137 requires a systematic approach, ranging from immediate mitigations for production incidents to strategic optimizations for long-term stability.
Immediate Diagnosis: Check Container Exit Status¶
Rapidly identify if the OOMKilled flag is set and retrieve container logs to confirm memory-related errors. This initial diagnosis is crucial for understanding the immediate context of the termination.
- Inspect the container's state to check the
OOMKilledflag and exit code. - Retrieve container logs to identify any memory-related error messages or warnings leading up to the termination.
- Check the host system's
dmesgoutput for kernel OOM killer invocations specific to the container's processes. - If available, verify the container's memory usage metrics at the time of failure.
docker inspect <container_id> | grep -A 5 '"State"'
docker logs <container_id> | tail -50
docker inspect <container_id> --format='{{json .State}}' | jq '.OOMKilled, .ExitCode'
sudo dmesg | grep -i 'oom\|killed' | tail -20
!!! tip "Immediate Mitigation: Increase Container Memory Limit"¶
Immediately raise the memory allocation for the container to prevent further OOMKilled terminations. This is a temporary solution for production outages and provides headroom for deeper analysis.
- Stop the affected container.
- Update the
docker-compose.ymlfile or Kubernetes manifest to increase themem_limit(Docker) orlimits.memory(Kubernetes). - Restart the container with the new resource limits.
- Monitor the container's memory usage and application performance to confirm stability.
# For a running pod (temporary, often overridden by deployment spec):
kubectl set resources pod <pod_name> -c <container_name> --limits=memory=2Gi
# For a deployment (best practice for persistent change):
# Edit your deployment manifest (e.g., via `kubectl edit deployment/<deployment_name>`)
# and update the 'limits.memory' field under spec.template.spec.containers[].resources
kubectl rollout restart deployment/<deployment_name>
!!! success "Best Practice Fix: Optimize Application Memory Usage"¶
Identify and eliminate memory leaks within the application code itself. This is the most robust, long-term solution for production stability and resource efficiency.
- Profile application memory usage using language-specific tools to pinpoint excessive consumption or growth patterns.
- Review the codebase for common memory leak patterns, such as unclosed database connections, unreleased file handles, unmanaged object caches, or unbounded data structures (arrays, maps).
- Implement proper resource cleanup mechanisms, utilizing
finallyblocks,try-with-resources(Java),context managers(Python), or equivalent patterns. - Integrate granular memory monitoring and alerting for your application to detect regressions or new leaks early.
- Thoroughly test the optimized application under various load conditions before deploying to production.
!!! success "Best Practice Fix: Configure Proper Resource Requests and Limits"¶
Set realistic memory requests and limits based on observed application behavior. This prevents OOMKilled events by providing adequate resources and ensures efficient scheduling of pods/containers by the orchestrator.
- Profile the application under expected peak load to establish a baseline and maximum memory requirement.
- Set
requests.memoryto approximately 80% of the baseline average memory usage. - Set
limits.memoryto 120-150% of the peak observed memory usage, providing a buffer without over-provisioning. - Apply these resource definitions to your
docker-compose.ymlor Kubernetes deployment manifests. - Continuously monitor actual resource utilization and adjust limits iteratively as application behavior evolves.
# Kubernetes deployment example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
template:
spec:
containers:
- name: app
image: your-app-image:latest
resources:
requests:
memory: "512Mi" # Guarantees this much memory
limits:
memory: "1Gi" # Container will be killed if it exceeds this
Fix: Enable Swap and Configure OOM Behavior¶
While generally not recommended for performance-critical containers, enabling swap on the host and configuring container swap limits can provide a memory overflow buffer. Adjusting kernel vm.overcommit_memory parameters can also influence OOM killer behavior.
Host System Modification & Data Loss Warning
Enabling or modifying swap space directly on the host system can impact overall system performance and stability. Incorrect configuration can lead to system freezes or data loss. Proceed with caution and ensure proper backups. For Kubernetes environments, configuring swap is typically managed at the node level, and its effects on pods can be complex.
- Enable swap on the host system if it's not already configured.
- Configure a
memswap_limitfor your container indocker-compose.ymlor viadocker runflags. Note that Kubernetes generally disables swap for containers by default. - Adjust the
vm.overcommit_memorykernel parameter on the host.vm.overcommit_memory=1allows the kernel to overcommit memory, potentially delaying OOM, but risks system instability. - For extremely critical containers, the
oom_kill_disableflag can be set, but this risks system instability if that container consumes all available memory. This requiresSYS_RESOURCEcapability.
# Kubernetes pod example with oom_kill_disable (requires SYS_RESOURCE)
apiVersion: v1
kind: Pod
metadata:
name: app-critical
spec:
containers:
- name: app
image: your-app-image:latest
securityContext:
capabilities:
add:
- SYS_RESOURCE # Required for oom_kill_disable
# Note: oom_kill_disable is not directly exposed in Kubernetes resource limits.
# It needs to be configured at the container runtime level (e.g., containerd/cgroup settings)
# and is generally NOT recommended due to the risk of starving the host.
# This example is conceptual; practical implementation is complex and often avoided.
Fix: Correct Entrypoint Signal Handling¶
An exit code 137 can also occur if docker stop is issued and the container's main process does not properly handle SIGTERM (signal 15) before the default 10-second grace period expires, leading to a forceful SIGKILL (signal 9).
- Verify that your Dockerfile's
ENTRYPOINTorCMDusesexecto ensure the main application process receives signals directly, rather than a shell wrapper. - Ensure your application runs in the foreground, not as a background daemon, allowing it to be the PID 1 process that receives signals.
- Implement robust signal handlers within your application to perform graceful shutdowns (e.g., flush logs, close connections) when
SIGTERMis received. - Test the graceful shutdown process by running
docker stop <container_id>.
# Correct Dockerfile entrypoint example for Nginx:
FROM nginx:latest
COPY entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
CMD ["nginx", "-g", "daemon off;"] # Nginx runs in foreground
# entrypoint.sh example (ensure 'exec' is used):
#!/bin/bash
set -e # Exit immediately if a command exits with a non-zero status
# Add pre-start logic here if needed
# Crucial: Use 'exec' to replace the shell process with the application process.
# This ensures signals sent to the container PID 1 (which is usually the shell)
# are passed directly to the application.
exec nginx -g 'daemon off;'
# Test graceful shutdown:
docker run -d --name test-nginx nginx:latest
sleep 2 # Give container time to start
docker stop test-nginx # This sends SIGTERM, then SIGKILL after grace period
docker inspect test-nginx | grep ExitCode
# Expected ExitCode: 0 if graceful shutdown, 137 if SIGKILL after timeout
Fix: Implement Health Checks and Monitoring¶
Proactive health checks and robust monitoring can detect memory issues before they escalate to an OOMKilled event, enabling automated remediation or timely human intervention.
- Add liveness and readiness probes to your
docker-compose.ymlor Kubernetes deployment to check application health, potentially including memory indicators. - Configure memory threshold alerts (e.g., 80% of
limits.memory) in your monitoring system (Prometheus, Grafana, Datadog) to warn of impending OOM conditions. - Implement graceful degradation or auto-restart logic for services experiencing resource contention.
- Set up centralized logging for application and system memory metrics to aid in post-mortem analysis.
# Kubernetes deployment example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
spec:
template:
spec:
containers:
- name: app
image: your-app-image:latest
livenessProbe:
httpGet:
path: /health # Endpoint that indicates if app is running
port: 8080
initialDelaySeconds: 30 # Time before first check
periodSeconds: 10 # How often to check
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready # Endpoint that indicates if app is ready to serve traffic
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 1
🧩 Technical Context (Visualized)¶
The Linux kernel's Out-Of-Memory (OOM) killer is a critical component of its memory management system, designed to prevent system collapse when memory is exhausted. When a process (or in our case, a container) attempts to allocate memory beyond its cgroup limit (set by Docker/Kubernetes), the kernel steps in, identifying and terminating a "guilty" process to free up resources. This termination is enforced via a SIGKILL (signal 9), leading directly to the exit code 137 (128 + 9).
graph TD
A[Container Process Requests Memory] --> B{Memory Usage Exceeds Limit?};
B -- No --> C[Process Continues Running];
B -- Yes --> D{Linux Kernel OOM Killer Invoked};
D --> E[OOM Killer Selects & Terminates "Guilty" Process];
E --> F[Sends SIGKILL Signal (9)];
F --> G[Container Receives SIGKILL];
G --> H[Container Process Exits];
H --> I[Container Runtime Reports Exit Code 137];
I -- OR --> J[Container Runtime Reports OOMKilled: true];
✅ Verification¶
After implementing any of the solutions, verify the container's stability and exit status.
- Check the container's state and OOMKilled status.
- Review logs for any memory-related warnings.
- Monitor real-time memory usage.
- For Kubernetes, examine pod descriptions and previous logs.
- Check host-level
dmesgorjournalctlfor OOM killer activity.
docker inspect <container_id> --format='{{json .State}}' | jq '.OOMKilled, .ExitCode, .Error'
docker logs <container_id> | grep -i 'memory\|oom\|killed'
docker stats <container_id> --no-stream
kubectl describe pod <pod_name> | grep -A 10 'State\|Last State'
kubectl logs <pod_name> --previous
sudo journalctl -u docker -n 50 | grep -i 'oom\|killed'
docker run --rm <image_id> /bin/sh -c 'free -h && ps aux'
kubectl top nodes
kubectl top pods -A --sort-by=memory
📦 Prerequisites¶
To effectively diagnose and resolve exit code 137, ensure you have the following tools and versions:
- Container Runtimes: Docker 18.9+ or Kubernetes 1.14+
- Operating System: Linux kernel 4.15+ (for robust cgroup v2 and OOM killer support)
- CLI Tools:
curlorwgetfor health checks,jqfor JSON parsing (highly recommended) - Permissions:
sudoaccess on the host for kernel tuning anddmesg/journalctlaccess - Orchestration:
docker-compose1.25+ orkubectl1.20+ - Profiling Tools: Language-specific memory profiling tools (e.g.,
memory-profilerfor Python, Chrome DevTools for Node.js,jmapfor Java)