Skip to content
Docker 📅 2026-02-07

Fixing Docker Container Exit Code 137 on Oracle Linux 9: Docker Compose Restart Loop

🚨 Symptoms & Diagnosis

Encountering Exit Code 137 when managing Docker containers via docker-compose on Oracle Linux 9 typically signals critical resource exhaustion, predominantly memory-related. This frequently manifests as containers entering a persistent restart loop, severely impacting application availability and overall system stability. Identifying the exact trigger is paramount for effective remediation.

Common error signatures include:

docker-compose up: container exited with code 137

Direct inspection of container state confirms the exit code and often, the underlying cause:

docker inspect <container_name> | grep -A 5 State
Expected output showing the OOMKilled flag:
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": true,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2023-10-27T10:00:05.123Z",
            "FinishedAt": "2023-10-27T10:00:10.987Z"
        },

Additional diagnostic indicators may include:

docker inspect <container>: "ExitCode": 137
docker logs <container>: Killed (often without an explicit Out-of-Memory message in application logs)
journalctl -u docker: Out of memory: Kill process

Root Cause: Exit code 137 is fundamentally triggered by the Linux kernel's Out-of-Memory (OOM) killer. It issues a SIGKILL (signal 9) to terminate processes, including Docker containers, when memory demand exceeds either the container's explicit memory limit or the Docker host's available memory. This often leads to persistent restart loops in docker-compose environments, indicative of critical resource contention.


🛠️ Solutions

Immediate Diagnosis & Quick Fix

Immediate Mitigation: Temporarily Increase Memory Limits

To stabilize your environment and mitigate ongoing service disruptions, rapidly diagnose the OOMKilled status and temporarily adjust memory allocations. This provides operational breathing room while you implement a more robust, permanent solution.

  1. Verify OOMKilled Status and Exit Code: Confirm if the container was explicitly killed by the OOM killer.

    docker inspect <container_name> | grep -E '"OOMKilled"|"ExitCode"'
    

  2. Monitor Real-time Container Memory Usage: Assess the container's current and historical memory footprint.

    docker stats --no-stream <container_name>
    

  3. Check Docker Host System Memory Pressure: Determine if the underlying host system is experiencing memory exhaustion.

    free -h
    cat /proc/meminfo | grep -E 'MemTotal|MemAvailable|MemFree'
    

  4. Review docker-compose.yml for Resource Allocation: Examine the docker-compose.yml file for existing mem_limit and memswap_limit settings under the affected service.

  5. Temporarily Increase Memory Limits: Adjust mem_limit and memswap_limit in your docker-compose.yml to provide additional memory. Exercise caution not to over-allocate, which could starve other services or the host system.

    services:
      app:
        image: myapp:latest
        mem_limit: 2g # Temporarily increase RAM limit to 2GB
        memswap_limit: 2g # Allocate equal swap limit to prevent early OOM for swap-heavy applications
        restart: unless-stopped
    
  6. Restart Docker Compose Services: Apply the updated resource configurations and restart your services.

    docker-compose down && docker-compose up -d
    

  7. Monitor Post-Restart Stability: Observe container memory usage and status to confirm stability.

    docker stats --no-stream <container_name>
    

Permanent Fix: Optimize Memory Allocation & Application

Best Practice Fix: Resource Optimization and Application Tuning

For sustained stability and optimal performance, a thorough analysis of application memory consumption, paired with precise resource allocation and resilient health checks, is critical. This approach targets the root cause of Exit Code 137.

  1. Profile Application Memory Usage: Utilize language-specific profiling tools (e.g., memory_profiler for Python) to understand your application's memory footprint under various load conditions.

    python -m memory_profiler script.py
    

  2. Identify and Resolve Memory Leaks: Memory leaks are a common source of gradual memory exhaustion. Debug your application code to pinpoint and eliminate these issues.

  3. Set Appropriate Memory Requests and Limits: Based on profiling data, configure mem_limit (hard ceiling) and mem_reservation (soft limit for scheduling) in docker-compose.yml to reflect actual application requirements.

  4. Configure Robust Health Checks: Implement healthcheck configurations with adequate start_period, interval, timeout, and retries. This prevents Docker from prematurely terminating a container that is still initializing or temporarily unresponsive.

  5. Implement Resource Quotas (for multi-service environments): In more complex deployments, consider resource quotas across services to prevent any single service from monopolizing host memory resources.

  6. Conduct Load Testing: Thoroughly test your application under simulated production load to validate memory allocations and identify potential bottlenecks before production deployment.

  7. Proactive Monitoring with Docker Stats and Kernel Logs: Continuously monitor container metrics and review kernel OOM events for any signs of memory pressure.

    docker stats <container_name>
    sudo journalctl -u docker -n 100 | grep -i 'out of memory'
    sudo dmesg | grep -i 'oom-kill'
    

  8. Establish Alerting: Configure monitoring and alerting for memory usage exceeding predefined thresholds (e.g., 70-80% of mem_limit) and for host-level OOM killer activations.

    Example docker-compose.yml with optimized resource configuration and health checks:

    version: '3.8'
    services:
      app:
        image: myapp:latest
        mem_limit: 1g             # Hard memory limit (RAM)
        memswap_limit: 1g         # Total memory (RAM + swap) limit
        mem_reservation: 512m     # Soft reservation; Docker attempts to keep usage below this
        healthcheck:
          test: ["CMD", "curl", "-f", "http://localhost:8080/health"] # Example health endpoint
          interval: 30s           # Check every 30 seconds
          timeout: 10s            # Allow 10 seconds for the check to complete
          retries: 3              # Three consecutive failures to be considered unhealthy
          start_period: 40s       # Initial delay before health checks begin
        restart: unless-stopped # Define proper restart behavior
        logging:
          driver: "json-file"
          options:
            max-size: "10m"
            max-file: "3"
    

    Verify Memory Limits Enforcement:

    docker inspect <container> | grep -E 'Memory|MemorySwap'
    

    Test Restart Behavior Post-Configuration:

    docker-compose up -d
    sleep 60 # Allow time for services to stabilize or manifest issues
    docker-compose ps
    docker logs <container_name> | tail -20
    

Advanced: Horizontal Scaling & Resource Quotas

Best Practice Fix: Distributed Resources and Scalability

For high-availability, high-traffic, or performance-sensitive applications, scaling services horizontally and implementing orchestration-level resource quotas offer robust protection against OOM conditions and single points of failure.

  1. Implement Docker Compose Scaling (via Docker Swarm) or Kubernetes Deployment: Distribute the application load across multiple container instances. While docker-compose itself primarily manages a single instance for deploy, Docker Swarm mode (docker stack deploy) or a dedicated Kubernetes cluster are the standard approaches for true scaling.

  2. Configure Load Balancer: Utilize an external load balancer (e.g., Nginx, HAProxy) or an ingress controller (in Kubernetes) to efficiently distribute incoming traffic among the scaled application replicas.

  3. Set Resource Quotas Per Service/Namespace: In orchestrated environments, define CPU and memory quotas at the service or namespace level to ensure fair resource sharing and prevent any single service from exhausting host resources.

    Example docker-compose.yml (for Docker Swarm deployment with deploy section):

    version: '3.8'
    services:
      app:
        image: myapp:latest
        deploy:
          replicas: 3 # Scale to 3 instances
          resources:
            limits:
              cpus: '0.5' # Each replica limited to 0.5 CPU cores
              memory: 512M # Each replica limited to 512MB RAM
            reservations:
              cpus: '0.25' # Each replica reserves 0.25 CPU cores
              memory: 256M # Each replica reserves 256MB RAM
          restart_policy:
            condition: on-failure
            delay: 5s
            max_attempts: 3
            window: 120s
      nginx: # Example load balancer
        image: nginx:latest
        ports:
          - "80:80"
        depends_on:
          - app
    
  4. Enable Autoscaling: For dynamically varying workloads, configure autoscaling mechanisms (e.g., Kubernetes Horizontal Pod Autoscaler) based on memory utilization metrics.

  5. Implement Centralized Logging: Aggregate logs from all container instances into a centralized system for comprehensive memory tracking, performance analysis, and anomaly detection.

  6. Configure Advanced Alerting Thresholds: Set up proactive alerts for memory usage exceeding 70-80% of allocated limits across all service replicas to enable timely intervention.

    Monitor Memory Across All Containers:

    docker stats --all --no-stream
    

    Check HostConfig for Applied Resource Limits:

    docker inspect <container> | grep -A 10 'HostConfig'
    

Debugging: Log Parsing & Signal Analysis

Best Practice Fix: Deep Dive into Diagnostics

If standard memory-related solutions do not resolve the issue, or if OOMKilled is false despite Exit Code 137, a detailed forensic analysis of Docker daemon logs, kernel events, and application logs is indispensable to identify non-OOM SIGKILL triggers.

  1. Capture Docker Daemon Logs: The Docker daemon logs provide critical insights into container lifecycle events, including termination reasons.

    sudo journalctl -u docker -f --since '10 minutes ago'
    

  2. Parse Kernel OOM Killer Events: The kernel logs (dmesg or journalctl with kernel filters) are the authoritative source for OOM killer activations.

    sudo dmesg | tail -50
    sudo journalctl -p err -n 100 | grep -i 'memory\|oom\|kill'
    

  3. Check for Health Check Failures: A misconfigured or consistently failing health check can lead Docker or an orchestrator to send a SIGKILL, resulting in Exit Code 137 without an explicit OOM event.

    docker logs --tail 100 <container_name>
    

  4. Analyze Application Logs for Internal Crashes: Application-level errors, unhandled exceptions, or segmentation faults (SIGSEGV) can also cause process termination, potentially triggering a SIGKILL by Docker's supervisor, which would manifest as Exit Code 137.

  5. Correlate Timestamps Across Log Sources: Match timestamps from docker logs, journalctl -u docker, and dmesg to reconstruct the precise sequence of events leading to the container's termination.

  6. Verify SIGKILL vs. OOMKilled Distinction: Use docker inspect to explicitly differentiate between an OOMKilled event and other SIGKILL signals. If OOMKilled is false but ExitCode is 137, another process (e.g., an orchestrator, or another part of Docker's supervision) sent the SIGKILL.

    docker inspect <container_name> --format='OOMKilled: {{.State.OOMKilled}}, ExitCode: {{.State.ExitCode}}, Error: {{.State.Error}}'
    

    Real-time System Monitoring:

    watch -n 1 'docker stats --no-stream && echo "---" && free -h'
    

    Capture Memory Pressure Events (if Pressure Stall Information (PSI) is available on kernel):

    grep -i 'memory.pressure_level' /proc/pressure/memory 2>/dev/null || echo 'PSI not available'
    

🧩 Technical Context (Visualized)

Exit Code 137 precisely indicates that a container process was terminated by a SIGKILL signal (signal 9) from the operating system. In the context of Docker, especially with resource constraints, this signal is most frequently issued by the Linux kernel's Out-of-Memory (OOM) killer. The OOM killer intervenes when system or container memory limits are surpassed, strategically terminating processes to reclaim memory and prevent critical system instability. This often precipitates a docker-compose restart loop, as the container repeatedly attempts to initiate, exhausts its memory, gets killed, and then attempts to restart again.

graph TD
    A[Docker Container Application] --> B{Memory Consumption Increases};
    B -- Exceeds Docker Mem_Limit --> C["Container Runtime (Docker)"];
    B -- Exceeds Host RAM / Swap --> D[Linux Kernel OOM Killer];
    C -- "No available memory or swap, or container limit hit" --> D;
    D -- "Sends SIGKILL (Signal 9)" --> E[Container Process Terminated];
    E -- Reports --> F[Exit Code 137];
    F --> G[Docker Daemon];
    G -- "restart: unless-stopped" Policy --> H{docker-compose Initiates Restart};
    H -- Leads to --> I[Persistent Restart Loop];
    D -- Logs Events To --> J["Kernel Log (dmesg, journalctl)"];
    G -- Logs Events To --> K["Docker Daemon Log (journalctl -u docker)"];

✅ Verification

After implementing any of the proposed solutions, systematically verify the container's operational status and resource consumption to confirm the fix:

  1. Check Container State and OOMKilled Flag:

    docker inspect <container_name> --format='{{json .State}}' | jq '.ExitCode, .OOMKilled'
    
    Expected output should be 0 for ExitCode and false for OOMKilled, indicating a clean exit or continuous running.

  2. Monitor Live Container Resource Statistics:

    docker stats --no-stream <container_name>
    
    Observe memory usage to ensure it remains stable and well within the configured mem_limit.

  3. Assess Docker Host Memory Availability:

    free -h
    
    Confirm that the Docker host system has sufficient free memory, reducing the likelihood of host-level OOM events.

  4. Validate Docker Compose Service Status:

    docker-compose ps
    
    Verify that all your services are in the Up state and not repeatedly exiting or restarting.

  5. Review Recent Container Logs:

    docker logs <container_name> | tail -20
    
    Look for any Killed messages, application-level errors, or unusual termination patterns.

  6. Inspect Docker Daemon and Kernel Logs for OOM Events:

    sudo journalctl -u docker -n 20 | grep -i '137\|oom\|kill'
    sudo dmesg | grep -i 'oom-kill'
    
    Confirm the absence of new OOM-related messages or Exit Code 137 entries.

📦 Prerequisites

To effectively apply and troubleshoot these solutions, ensure your environment meets the following prerequisites:

  • Docker Engine: Version 20.10 or newer.
  • Docker Compose: Version 2.0 or newer.
  • Operating System: Oracle Linux 9 (or any compatible RHEL 9 distribution).
  • Access Privileges: sudo or root access is required for inspecting kernel logs (dmesg, journalctl).
  • Utilities: curl or wget might be necessary for implementing health checks within containers. jq is highly recommended for efficient JSON parsing of Docker inspection output.