Skip to content

Getting Started

This site is built around one rule: start with the setup order that keeps working in real Gemma 4 threads, not with benchmark charts.

  1. Start with the easiest runtime for your device instead of the most configurable one.
  2. Start with the smallest model that has a real chance of fitting comfortably.
  3. Keep the first run short. One clean reply matters more than a huge context test.
  4. Confirm that the device is actually using the accelerator path you expect.
  5. Only after short chat works should you test long context, vision, or tool calling.
  6. If agentic use matters, update the runtime and template before you decide the model is bad.

Answer these four questions first:

  1. Is your target device a phone, tablet, laptop, or desktop?
  2. How much RAM, VRAM, or unified memory does it actually have?
  3. Do you want the easiest UI, or the most control?
  4. Are you optimizing for speed, quality, or just proving that it runs?
  • They download 31B or an aggressive desktop quant before checking whether the machine can even carry it.
  • They assume a phone GPU or NPU is active when the app has silently fallen back to CPU.
  • They judge Gemma 4 through an old desktop backend, stale template, or older quant build.
  • They jump into long context, agent tools, or vision before proving that short chat is stable.
  • Phone daily driver: AI Edge Gallery plus E2B is still the safest mobile proof. It often feels better than forcing E4B on a phone that is not accelerating cleanly.
  • 16GB desktop GPU: 26B A4B can work, but only if you treat quant choice, context length, and vision overhead as first-class setup decisions.
  • Repurposed mobile hardware: a stripped Android phone can become a small local inference node, but the community keeps pushing these builds toward llama.cpp once performance starts to matter.

Running Gemma 4 locally means inference is happening on your device. It does not mean the model has live web access. The official Gemma 4 model card lists the training cutoff as January 2025.