GPU Not Used in Ollama? Full Fix Guide (NVIDIA / AMD)

So you installed Ollama on your homelab server or desktop, pulled a model like Llama 3 or Mistral, and started chatting — only to realize your GPU is sitting idle while the CPU is taking all the heat. If you have opened nvidia-smi and seen zero utilization, or your AMD GPU shows no activity during inference, you are not alone. This is one of the most common pain points in self-hosting local AI.

This guide covers the verified fixes for both NVIDIA and AMD GPUs across bare-metal Linux installations and Docker containers, based on the official Ollama documentation and community troubleshooting resources.

Why Ollama Falls Back to CPU

Ollama automatically detects compatible GPUs at startup. According to the official Ollama hardware support documentation, the server logs contain GPU discovery information. When you run ollama serve, the log lines include statements like "looking for compatible GPUs" and "no compatible GPUs were discovered" when detection fails.

The most common reasons for GPU fallback include:

Missing or outdated GPU drivers on the host system

Insufficient permissions to access GPU devices (especially with AMD)

Missing NVIDIA Container Toolkit when running inside Docker

Unsupported GPU architecture not included in Ollama's pre-built library

Systemd cgroup management interference in Docker

Let us go through the fixes step by step.

Prerequisites

Before diving into fixes, make sure you have:

Ollama installed (either bare-metal or via Docker)

A compatible GPU (NVIDIA with CUDA compute capability, AMD Radeon with ROCm support, or a Vulkan-capable GPU)

Root or sudo access on your Linux machine

Diagnosing the Problem

Enable Debug Logging

Set the environment variable OLLAMA_DEBUG=1 before starting the Ollama server. According to the Ollama troubleshooting guide, this reveals more detailed error codes during GPU discovery.

bash

OLLAMA_DEBUG=1 ollama serve

Look for log lines that mention GPU detection. If you see "no compatible GPUs were discovered", the server could not find usable hardware.

Check GPU Visibility on the Host

For NVIDIA:

bash

nvidia-smi

This command shows your GPU model, driver version, CUDA version, and memory usage. If this command fails, the NVIDIA driver is not installed correctly on the host.

For AMD:

bash

sudo dmesg | grep -i amdgpu
sudo dmesg | grep -i kfd

These commands check kernel messages for AMD GPU and ROCk (KFD) driver errors, as recommended in the Ollama troubleshooting documentation.

Check GPU Visibility Inside Docker

If running Ollama in a Docker container:

bash

docker run --gpus all ubuntu nvidia-smi

As noted in the official Ollama troubleshooting page, if this command does not work, Ollama will not be able to see your NVIDIA GPU inside the container.

Fixing NVIDIA GPU Detection

1. Install or Update NVIDIA Drivers on the Host

Ollama requires a CUDA-compatible NVIDIA driver. On Linux, install the proprietary NVIDIA driver from your distribution's package manager or directly from NVIDIA's Linux driver page. After installation, reboot and verify with:

bash

nvidia-smi

If nvidia-smi reports an error like "No devices were found", your driver installation is incomplete or the GPU is not properly seated.

2. Install the NVIDIA Container Toolkit (for Docker)

When running Ollama in a Docker container, the NVIDIA Container Toolkit is mandatory. Install it on the host system. The official NVIDIA documentation covers the installation steps for various Linux distributions. After installing the toolkit and restarting the Docker daemon, verify with:

bash

docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

3. Run the Ollama Docker Container with GPU Access

Once the NVIDIA Container Toolkit is working, run Ollama with the --gpus all flag:

bash

docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

4. Disable Systemd Cgroup Management (Intermittent GPU Loss)

According to the Ollama troubleshooting guide, if Ollama initially works on the GPU but later switches to CPU with errors about "GPU discovery failures", the issue may be systemd cgroup management interference in Docker. Disabling this in your Docker daemon configuration can resolve the problem.

5. GPU Selection via Environment Variables

Ollama supports the CUDA_VISIBLE_DEVICES environment variable. To force Ollama to use a specific GPU:

bash

CUDA_VISIBLE_DEVICES=0 ollama serve

Replace 0 with the appropriate GPU index from nvidia-smi output.

6. Linux Suspend / Resume Issue

The official Ollama GPU documentation covers a known issue where the GPU is lost after resume from suspend. Restarting the Ollama service after resume typically restores GPU access.

Fixing AMD GPU Detection

1. Install AMD GPU Drivers and ROCm

Ollama uses ROCm for AMD GPU acceleration on Linux. Install the AMDGPU driver and ROCm stack from AMD's official Linux driver page. The Ollama documentation notes that it "recommends running the AMD Linux drivers" from the official source.

Check that the amdgpu kernel module is loaded:

bash

lsmod | grep amdgpu

2. Verify User Permissions for /dev/kfd and /dev/dri

A very common error with AMD GPUs is permission problems blocking access to /dev/kfd. The Ollama logs show:

code

amdgpu devices detected but permission problems block access
error="failed to check permission on /dev/kfd: open /dev/kfd: invalid argument"

Fix this by adding your user to the render and video groups:

bash

sudo usermod -aG render,video $USER

Log out and back in (or reboot) for the changes to take effect.

3. Use HSA_OVERRIDE_GFX_VERSION for Unsupported GPUs

If your AMD GPU is newer than what Ollama's pre-built ROCm library supports, the server will log:

code

amdgpu is not supported gpu_type=gfx1103
supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 ...]"

Ollama's documentation points to the HSA_OVERRIDE_GFX_VERSION environment variable for overriding the GPU architecture detection. This is covered in the Ollama GPU documentation under "Overrides on Linux".

For example:

bash

HSA_OVERRIDE_GFX_VERSION=11.0.0 ollama serve

Check the official documentation for the exact version override for your specific GPU.

4. Container Permissions for AMD GPUs

When running Ollama in a Docker container with AMD GPUs, you must pass through the device files. According to the official ROCm Docker documentation, use:

bash

docker run -d --device=/dev/kfd --device=/dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm

Note the :rocm tag on the Ollama image, which contains the ROCm runtime libraries.

5. GPU Selection for AMD

The environment variable HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES can be used to select specific AMD GPUs, as shown in Ollama server log configurations.

Fixing GPU Detection in Docker (General)

Verify Docker Runtime Configuration

For NVIDIA, ensure the default Docker runtime is set to nvidia in /etc/docker/daemon.json. For AMD, verify that the --device flags are correctly passed.

Check Container Logs

When running Ollama in a container, logs go to stdout/stderr inside the container:

bash

docker logs ollama

Look for the same GPU discovery messages as on bare metal.

Troubleshooting Checklist

Run Ollama outside Docker first — If the GPU works on bare metal but not in Docker, the issue is container configuration, not the GPU itself.

Update Ollama — Some GPU detection bugs are fixed in newer releases. Check the GitHub releases page for the latest version.

Check CUDA / ROCm version compatibility — Ollama's pre-built runners target specific CUDA and ROCm versions. If your driver is too old, GPU detection may fail.

Test with a small model — Use a small model like llama3.2:1b or phi3:mini for quicker testing during troubleshooting.

Monitor GPU usage — Open a second terminal and run watch -n 1 nvidia-smi (NVIDIA) or radeontop (AMD) to see real-time GPU utilization.

Verifying That the Fix Worked

Once you apply the relevant fix:

1. Start Ollama server with OLLAMA_DEBUG=1

2. Look for log messages confirming GPU discovery (e.g., "inference compute" with the GPU library listed instead of cpu)

3. Pull and run a model: ollama run llama3.2:1b

4. In another terminal, run nvidia-smi or radeontop and watch for VRAM consumption and GPU core utilization spiking

If you see the GPU being used, the fix is successful.

Conclusion

Getting Ollama to use your GPU instead of the CPU is often a matter of proper driver installation, correct permissions, and — when using Docker — the right container runtime and device passthrough configuration. For NVIDIA, the NVIDIA Container Toolkit is the linchpin. For AMD, group membership and device file access are the usual culprits.

Self-hosting local AI models on your own hardware gives you privacy, control, and zero API costs. Watching your GPU finally light up under load is deeply satisfying. Work through the checklist above, consult the official Ollama documentation at docs.ollama.com for the latest supported GPU architectures, and you will have your local LLM humming on the right hardware in no time.

GPU Not Used in Ollama? Full Fix Guide (NVIDIA / AMD) 2025

GPU Not Used in Ollama? Full Fix Guide (NVIDIA / AMD)

Why Ollama Falls Back to CPU

Prerequisites

Diagnosing the Problem

Enable Debug Logging

Check GPU Visibility on the Host

Check GPU Visibility Inside Docker

Fixing NVIDIA GPU Detection

1. Install or Update NVIDIA Drivers on the Host

2. Install the NVIDIA Container Toolkit (for Docker)

3. Run the Ollama Docker Container with GPU Access

4. Disable Systemd Cgroup Management (Intermittent GPU Loss)

5. GPU Selection via Environment Variables

6. Linux Suspend / Resume Issue

Fixing AMD GPU Detection

1. Install AMD GPU Drivers and ROCm

2. Verify User Permissions for /dev/kfd and /dev/dri

3. Use HSA_OVERRIDE_GFX_VERSION for Unsupported GPUs

4. Container Permissions for AMD GPUs

5. GPU Selection for AMD

Fixing GPU Detection in Docker (General)

Verify Docker Runtime Configuration

Check Container Logs

Troubleshooting Checklist

Verifying That the Fix Worked

Conclusion

Related Articles

Nginx Reverse Proxy Mistakes That Break Applications (And How to Fix Them)

LM Studio vs Ollama vs OpenClaw for Production Local AI (2026)

LM Studio Says 128K Context But OpenClaw Only Uses 32K — Full Explanation (2026)

OpenClaw Agent Stuck: Root Causes and Fixes for Homelab Users

More in Docker

OpenClaw No Output / Empty Response Fix: A Homelab Practitioner's Guide to Debugging Silent Agent Failures

Why OpenClaw Is Not Responding: Full Fix Guide (2026)