Docker 📅 2026-02-07

Fixing Docker Container Exit Code 137 on Oracle Linux 9: Docker Compose Restart Loop

🚨 Symptoms & Diagnosis¶

Encountering Exit Code 137 when managing Docker containers via docker-compose on Oracle Linux 9 typically signals critical resource exhaustion, predominantly memory-related. This frequently manifests as containers entering a persistent restart loop, severely impacting application availability and overall system stability. Identifying the exact trigger is paramount for effective remediation.

Common error signatures include:

docker-compose up: container exited with code 137

Direct inspection of container state confirms the exit code and often, the underlying cause:

docker inspect <container_name> | grep -A 5 State

Expected output showing the OOMKilled flag:

        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": true,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2023-10-27T10:00:05.123Z",
            "FinishedAt": "2023-10-27T10:00:10.987Z"
        },

Additional diagnostic indicators may include:

docker inspect <container>: "ExitCode": 137
docker logs <container>: Killed (often without an explicit Out-of-Memory message in application logs)
journalctl -u docker: Out of memory: Kill process

Root Cause: Exit code 137 is fundamentally triggered by the Linux kernel's Out-of-Memory (OOM) killer. It issues a SIGKILL (signal 9) to terminate processes, including Docker containers, when memory demand exceeds either the container's explicit memory limit or the Docker host's available memory. This often leads to persistent restart loops in docker-compose environments, indicative of critical resource contention.

🛠️ Solutions¶

Immediate Diagnosis & Quick Fix¶

Immediate Mitigation: Temporarily Increase Memory Limits

To stabilize your environment and mitigate ongoing service disruptions, rapidly diagnose the OOMKilled status and temporarily adjust memory allocations. This provides operational breathing room while you implement a more robust, permanent solution.

Verify OOMKilled Status and Exit Code: Confirm if the container was explicitly killed by the OOM killer.
```
docker inspect <container_name> | grep -E '"OOMKilled"|"ExitCode"'
```
Monitor Real-time Container Memory Usage: Assess the container's current and historical memory footprint.
```
docker stats --no-stream <container_name>
```
Check Docker Host System Memory Pressure: Determine if the underlying host system is experiencing memory exhaustion.
```
free -h
cat /proc/meminfo | grep -E 'MemTotal|MemAvailable|MemFree'
```
Review docker-compose.yml for Resource Allocation: Examine the docker-compose.yml file for existing mem_limit and memswap_limit settings under the affected service.

Temporarily Increase Memory Limits: Adjust mem_limit and memswap_limit in your docker-compose.yml to provide additional memory. Exercise caution not to over-allocate, which could starve other services or the host system.

services:
  app:
    image: myapp:latest
    mem_limit: 2g # Temporarily increase RAM limit to 2GB
    memswap_limit: 2g # Allocate equal swap limit to prevent early OOM for swap-heavy applications
    restart: unless-stopped

Restart Docker Compose Services: Apply the updated resource configurations and restart your services.
```
docker-compose down && docker-compose up -d
```
Monitor Post-Restart Stability: Observe container memory usage and status to confirm stability.
```
docker stats --no-stream <container_name>
```

Permanent Fix: Optimize Memory Allocation & Application¶

Best Practice Fix: Resource Optimization and Application Tuning

For sustained stability and optimal performance, a thorough analysis of application memory consumption, paired with precise resource allocation and resilient health checks, is critical. This approach targets the root cause of Exit Code 137.

Profile Application Memory Usage: Utilize language-specific profiling tools (e.g., memory_profiler for Python) to understand your application's memory footprint under various load conditions.
```
python -m memory_profiler script.py
```
Identify and Resolve Memory Leaks: Memory leaks are a common source of gradual memory exhaustion. Debug your application code to pinpoint and eliminate these issues.
Set Appropriate Memory Requests and Limits: Based on profiling data, configure mem_limit (hard ceiling) and mem_reservation (soft limit for scheduling) in docker-compose.yml to reflect actual application requirements.
Configure Robust Health Checks: Implement healthcheck configurations with adequate start_period, interval, timeout, and retries. This prevents Docker from prematurely terminating a container that is still initializing or temporarily unresponsive.
Implement Resource Quotas (for multi-service environments): In more complex deployments, consider resource quotas across services to prevent any single service from monopolizing host memory resources.
Conduct Load Testing: Thoroughly test your application under simulated production load to validate memory allocations and identify potential bottlenecks before production deployment.
Proactive Monitoring with Docker Stats and Kernel Logs: Continuously monitor container metrics and review kernel OOM events for any signs of memory pressure.
```
docker stats <container_name>
sudo journalctl -u docker -n 100 | grep -i 'out of memory'
sudo dmesg | grep -i 'oom-kill'
```

Establish Alerting: Configure monitoring and alerting for memory usage exceeding predefined thresholds (e.g., 70-80% of mem_limit) and for host-level OOM killer activations.

Example docker-compose.yml with optimized resource configuration and health checks:

version: '3.8'
services:
  app:
    image: myapp:latest
    mem_limit: 1g             # Hard memory limit (RAM)
    memswap_limit: 1g         # Total memory (RAM + swap) limit
    mem_reservation: 512m     # Soft reservation; Docker attempts to keep usage below this
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"] # Example health endpoint
      interval: 30s           # Check every 30 seconds
      timeout: 10s            # Allow 10 seconds for the check to complete
      retries: 3              # Three consecutive failures to be considered unhealthy
      start_period: 40s       # Initial delay before health checks begin
    restart: unless-stopped # Define proper restart behavior
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Verify Memory Limits Enforcement:

docker inspect <container> | grep -E 'Memory|MemorySwap'

Test Restart Behavior Post-Configuration:

docker-compose up -d
sleep 60 # Allow time for services to stabilize or manifest issues
docker-compose ps
docker logs <container_name> | tail -20

Advanced: Horizontal Scaling & Resource Quotas¶

Best Practice Fix: Distributed Resources and Scalability

For high-availability, high-traffic, or performance-sensitive applications, scaling services horizontally and implementing orchestration-level resource quotas offer robust protection against OOM conditions and single points of failure.

Implement Docker Compose Scaling (via Docker Swarm) or Kubernetes Deployment: Distribute the application load across multiple container instances. While docker-compose itself primarily manages a single instance for deploy, Docker Swarm mode (docker stack deploy) or a dedicated Kubernetes cluster are the standard approaches for true scaling.
Configure Load Balancer: Utilize an external load balancer (e.g., Nginx, HAProxy) or an ingress controller (in Kubernetes) to efficiently distribute incoming traffic among the scaled application replicas.

Set Resource Quotas Per Service/Namespace: In orchestrated environments, define CPU and memory quotas at the service or namespace level to ensure fair resource sharing and prevent any single service from exhausting host resources.

Example docker-compose.yml (for Docker Swarm deployment with deploy section):

version: '3.8'
services:
  app:
    image: myapp:latest
    deploy:
      replicas: 3 # Scale to 3 instances
      resources:
        limits:
          cpus: '0.5' # Each replica limited to 0.5 CPU cores
          memory: 512M # Each replica limited to 512MB RAM
        reservations:
          cpus: '0.25' # Each replica reserves 0.25 CPU cores
          memory: 256M # Each replica reserves 256MB RAM
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 120s
  nginx: # Example load balancer
    image: nginx:latest
    ports:
      - "80:80"
    depends_on:
      - app

Enable Autoscaling: For dynamically varying workloads, configure autoscaling mechanisms (e.g., Kubernetes Horizontal Pod Autoscaler) based on memory utilization metrics.
Implement Centralized Logging: Aggregate logs from all container instances into a centralized system for comprehensive memory tracking, performance analysis, and anomaly detection.
Configure Advanced Alerting Thresholds: Set up proactive alerts for memory usage exceeding 70-80% of allocated limits across all service replicas to enable timely intervention.

Monitor Memory Across All Containers:
```
docker stats --all --no-stream
```
Check HostConfig for Applied Resource Limits:
```
docker inspect <container> | grep -A 10 'HostConfig'
```

Debugging: Log Parsing & Signal Analysis¶

Best Practice Fix: Deep Dive into Diagnostics

If standard memory-related solutions do not resolve the issue, or if OOMKilled is false despite Exit Code 137, a detailed forensic analysis of Docker daemon logs, kernel events, and application logs is indispensable to identify non-OOM SIGKILL triggers.

Capture Docker Daemon Logs: The Docker daemon logs provide critical insights into container lifecycle events, including termination reasons.
```
sudo journalctl -u docker -f --since '10 minutes ago'
```
Parse Kernel OOM Killer Events: The kernel logs (dmesg or journalctl with kernel filters) are the authoritative source for OOM killer activations.
```
sudo dmesg | tail -50
sudo journalctl -p err -n 100 | grep -i 'memory\|oom\|kill'
```
Check for Health Check Failures: A misconfigured or consistently failing health check can lead Docker or an orchestrator to send a SIGKILL, resulting in Exit Code 137 without an explicit OOM event.
```
docker logs --tail 100 <container_name>
```
Analyze Application Logs for Internal Crashes: Application-level errors, unhandled exceptions, or segmentation faults (SIGSEGV) can also cause process termination, potentially triggering a SIGKILL by Docker's supervisor, which would manifest as Exit Code 137.
Correlate Timestamps Across Log Sources: Match timestamps from docker logs, journalctl -u docker, and dmesg to reconstruct the precise sequence of events leading to the container's termination.
Verify SIGKILL vs. OOMKilled Distinction: Use docker inspect to explicitly differentiate between an OOMKilled event and other SIGKILL signals. If OOMKilled is false but ExitCode is 137, another process (e.g., an orchestrator, or another part of Docker's supervision) sent the SIGKILL.
```
docker inspect <container_name> --format='OOMKilled: {{.State.OOMKilled}}, ExitCode: {{.State.ExitCode}}, Error: {{.State.Error}}'
```
Real-time System Monitoring:
```
watch -n 1 'docker stats --no-stream && echo "---" && free -h'
```
Capture Memory Pressure Events (if Pressure Stall Information (PSI) is available on kernel):
```
grep -i 'memory.pressure_level' /proc/pressure/memory 2>/dev/null || echo 'PSI not available'
```

🧩 Technical Context (Visualized)¶

Exit Code 137 precisely indicates that a container process was terminated by a SIGKILL signal (signal 9) from the operating system. In the context of Docker, especially with resource constraints, this signal is most frequently issued by the Linux kernel's Out-of-Memory (OOM) killer. The OOM killer intervenes when system or container memory limits are surpassed, strategically terminating processes to reclaim memory and prevent critical system instability. This often precipitates a docker-compose restart loop, as the container repeatedly attempts to initiate, exhausts its memory, gets killed, and then attempts to restart again.

graph TD
    A[Docker Container Application] --> B{Memory Consumption Increases};
    B -- Exceeds Docker Mem_Limit --> C["Container Runtime (Docker)"];
    B -- Exceeds Host RAM / Swap --> D[Linux Kernel OOM Killer];
    C -- "No available memory or swap, or container limit hit" --> D;
    D -- "Sends SIGKILL (Signal 9)" --> E[Container Process Terminated];
    E -- Reports --> F[Exit Code 137];
    F --> G[Docker Daemon];
    G -- "restart: unless-stopped" Policy --> H{docker-compose Initiates Restart};
    H -- Leads to --> I[Persistent Restart Loop];
    D -- Logs Events To --> J["Kernel Log (dmesg, journalctl)"];
    G -- Logs Events To --> K["Docker Daemon Log (journalctl -u docker)"];

✅ Verification¶

After implementing any of the proposed solutions, systematically verify the container's operational status and resource consumption to confirm the fix:

Check Container State and OOMKilled Flag:
```
docker inspect <container_name> --format='{{json .State}}' | jq '.ExitCode, .OOMKilled'
```
Expected output should be 0 for ExitCode and false for OOMKilled, indicating a clean exit or continuous running.
Monitor Live Container Resource Statistics:
```
docker stats --no-stream <container_name>
```
Observe memory usage to ensure it remains stable and well within the configured mem_limit.
Assess Docker Host Memory Availability:
```
free -h
```
Confirm that the Docker host system has sufficient free memory, reducing the likelihood of host-level OOM events.
Validate Docker Compose Service Status:
```
docker-compose ps
```
Verify that all your services are in the Up state and not repeatedly exiting or restarting.
Review Recent Container Logs:
```
docker logs <container_name> | tail -20
```
Look for any Killed messages, application-level errors, or unusual termination patterns.
Inspect Docker Daemon and Kernel Logs for OOM Events:
```
sudo journalctl -u docker -n 20 | grep -i '137\|oom\|kill'
sudo dmesg | grep -i 'oom-kill'
```
Confirm the absence of new OOM-related messages or Exit Code 137 entries.

📦 Prerequisites¶

To effectively apply and troubleshoot these solutions, ensure your environment meets the following prerequisites:

Docker Engine: Version 20.10 or newer.
Docker Compose: Version 2.0 or newer.
Operating System: Oracle Linux 9 (or any compatible RHEL 9 distribution).
Access Privileges: sudo or root access is required for inspecting kernel logs (dmesg, journalctl).
Utilities: curl or wget might be necessary for implementing health checks within containers. jq is highly recommended for efficient JSON parsing of Docker inspection output.