System requirements

While it is possible to run a local LLM on almost any device, you will need the following for the best experience:

  • At least 16GB of RAM.
  • A modern processor, supporting AVX2 instructions. If your processor was manufactured after 2011, it almost certainly supports this standard.
  • Optional, but highly recommended: a dedicated NVIDIA or AMD graphics card, with at least 8GB of VRAM.

The more powerful your computer, the stronger models you can run locally on your machine. There are three main methods of running local LLMs, and each depends on the amount of system memory you have:

GPU inference: The fastest and most preferred method is to load the model entirely into your VRAM (your graphics card's memory). You will need a dedicated graphics card.

The amount of VRAM you have available will determine the size of the model you can load. You should assume 8GB of VRAM to be the minimum, and 24GB VRAM to be optimal.

Did you know?

You can install multiple graphics cards into your system to maximize the amount of VRAM available for AI-related tasks.

GPU + CPU inference: This method will use both your VRAM and RAM to load the model. Offloading parts of the model to your system memory will allow you to load larger models at a significant performance penalty. For this, you will sum up both your VRAM and RAM. You can assume that a minimum of 16GB of memory in total will be necessary.

CPU-only inference: This method uses your CPU and RAM for inference and is the slowest by far, but also the cheapest method in terms of necessary hardware. Assume at least 16GB of RAM at a minimum.