Skip to content

Gemma 4 on Windows

First question: what hardware class is this?

On Windows, the biggest split is between:

RTX desktop or laptop with meaningful VRAM
CPU-only or low-VRAM machine

Best runtime choices

LM Studio: easiest entry point
llama.cpp: best if you want direct control
Ollama: useful if your goal is a local API, not the easiest first-time route

Recommended model choices

E2B / E4B: safest for broad compatibility
26B A4B: realistic only once memory and runtime are clearly under control
31B: only for genuinely strong local setups

Fastest decision tree

If you are new, use LM Studio.
If you are memory constrained, start with E2B.
If you want a local service endpoint, test Ollama.
If the model fails to fit, open Out of Memory.