GPU Not Used in Ollama? Full Fix Guide (NVIDIA / AMD) 2025
Is Ollama running on CPU instead of your GPU? Step-by-step troubleshooting for NVIDIA and AMD GPUs on Linux, Docker, and bare metal.
Is Ollama running on CPU instead of your GPU? Step-by-step troubleshooting for NVIDIA and AMD GPUs on Linux, Docker, and bare metal.
So you installed Ollama on your homelab server or desktop, pulled a model like Llama 3 or Mistral, and started chatting — only to realize your GPU is sitting idle while the CPU is taking all the heat. If you have opened nvidia-smi and seen zero utilization, or your AMD GPU shows no activity during inference, you are not alone. This is one of the most common pain points in self-hosting local AI.
This guide covers the verified fixes for both NVIDIA and AMD GPUs across bare-metal Linux installations and Docker containers, based on the official Ollama documentation and community troubleshooting resources.
Ollama automatically detects compatible GPUs at startup. According to the official Ollama hardware support documentation, the server logs contain GPU discovery information. When you run ollama serve, the log lines include statements like "looking for compatible GPUs" and "no compatible GPUs were discovered" when detection fails.
The most common reasons for GPU fallback include:
Let us go through the fixes step by step.
Before diving into fixes, make sure you have:
Set the environment variable OLLAMA_DEBUG=1 before starting the Ollama server. According to the Ollama troubleshooting guide, this reveals more detailed error codes during GPU discovery.
OLLAMA_DEBUG=1 ollama serveLook for log lines that mention GPU detection. If you see "no compatible GPUs were discovered", the server could not find usable hardware.
nvidia-smiThis command shows your GPU model, driver version, CUDA version, and memory usage. If this command fails, the NVIDIA driver is not installed correctly on the host.
For AMD:sudo dmesg | grep -i amdgpu
sudo dmesg | grep -i kfdThese commands check kernel messages for AMD GPU and ROCk (KFD) driver errors, as recommended in the Ollama troubleshooting documentation.
If running Ollama in a Docker container:
docker run --gpus all ubuntu nvidia-smiAs noted in the official Ollama troubleshooting page, if this command does not work, Ollama will not be able to see your NVIDIA GPU inside the container.
Ollama requires a CUDA-compatible NVIDIA driver. On Linux, install the proprietary NVIDIA driver from your distribution's package manager or directly from NVIDIA's Linux driver page. After installation, reboot and verify with:
nvidia-smiIf nvidia-smi reports an error like "No devices were found", your driver installation is incomplete or the GPU is not properly seated.
When running Ollama in a Docker container, the NVIDIA Container Toolkit is mandatory. Install it on the host system. The official NVIDIA documentation covers the installation steps for various Linux distributions. After installing the toolkit and restarting the Docker daemon, verify with:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smiOnce the NVIDIA Container Toolkit is working, run Ollama with the --gpus all flag:
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollamaAccording to the Ollama troubleshooting guide, if Ollama initially works on the GPU but later switches to CPU with errors about "GPU discovery failures", the issue may be systemd cgroup management interference in Docker. Disabling this in your Docker daemon configuration can resolve the problem.
Ollama supports the CUDA_VISIBLE_DEVICES environment variable. To force Ollama to use a specific GPU:
CUDA_VISIBLE_DEVICES=0 ollama serveReplace 0 with the appropriate GPU index from nvidia-smi output.
The official Ollama GPU documentation covers a known issue where the GPU is lost after resume from suspend. Restarting the Ollama service after resume typically restores GPU access.
Ollama uses ROCm for AMD GPU acceleration on Linux. Install the AMDGPU driver and ROCm stack from AMD's official Linux driver page. The Ollama documentation notes that it "recommends running the AMD Linux drivers" from the official source.
Check that the amdgpu kernel module is loaded:
lsmod | grep amdgpuA very common error with AMD GPUs is permission problems blocking access to /dev/kfd. The Ollama logs show:
amdgpu devices detected but permission problems block access
error="failed to check permission on /dev/kfd: open /dev/kfd: invalid argument"Fix this by adding your user to the render and video groups:
sudo usermod -aG render,video $USERLog out and back in (or reboot) for the changes to take effect.
If your AMD GPU is newer than what Ollama's pre-built ROCm library supports, the server will log:
amdgpu is not supported gpu_type=gfx1103
supported_types="[gfx1030 gfx1100 gfx1101 gfx1102 ...]"Ollama's documentation points to the HSA_OVERRIDE_GFX_VERSION environment variable for overriding the GPU architecture detection. This is covered in the Ollama GPU documentation under "Overrides on Linux".
For example:
HSA_OVERRIDE_GFX_VERSION=11.0.0 ollama serveCheck the official documentation for the exact version override for your specific GPU.
When running Ollama in a Docker container with AMD GPUs, you must pass through the device files. According to the official ROCm Docker documentation, use:
docker run -d --device=/dev/kfd --device=/dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocmNote the :rocm tag on the Ollama image, which contains the ROCm runtime libraries.
The environment variable HIP_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES can be used to select specific AMD GPUs, as shown in Ollama server log configurations.
For NVIDIA, ensure the default Docker runtime is set to nvidia in /etc/docker/daemon.json. For AMD, verify that the --device flags are correctly passed.
When running Ollama in a container, logs go to stdout/stderr inside the container:
docker logs ollamaLook for the same GPU discovery messages as on bare metal.
llama3.2:1b or phi3:mini for quicker testing during troubleshooting.watch -n 1 nvidia-smi (NVIDIA) or radeontop (AMD) to see real-time GPU utilization.Once you apply the relevant fix:
1. Start Ollama server with OLLAMA_DEBUG=1
2. Look for log messages confirming GPU discovery (e.g., "inference compute" with the GPU library listed instead of cpu)
3. Pull and run a model: ollama run llama3.2:1b
4. In another terminal, run nvidia-smi or radeontop and watch for VRAM consumption and GPU core utilization spiking
If you see the GPU being used, the fix is successful.
Getting Ollama to use your GPU instead of the CPU is often a matter of proper driver installation, correct permissions, and — when using Docker — the right container runtime and device passthrough configuration. For NVIDIA, the NVIDIA Container Toolkit is the linchpin. For AMD, group membership and device file access are the usual culprits.
Self-hosting local AI models on your own hardware gives you privacy, control, and zero API costs. Watching your GPU finally light up under load is deeply satisfying. Work through the checklist above, consult the official Ollama documentation at docs.ollama.com for the latest supported GPU architectures, and you will have your local LLM humming on the right hardware in no time.