Getting Started

This site is built around one rule: start with your device, not with benchmark charts.

Before you download anything

Answer these four questions first:

Is your target device a phone, tablet, laptop, or desktop?
How much RAM or unified memory does it actually have?
Do you want the easiest UI, or the most control?
Are you optimizing for speed, quality, or just proving that it runs?

Fastest paths by device

Android: start with AI Edge Gallery.
iPhone and iPad: start with the iPhone and iPad guide.
Mac: start with the Mac guide and decide between LM Studio and llama.cpp.
Windows: start with the Windows guide and check VRAM before anything else.

Fastest paths by runtime preference

I want the easiest desktop app: LM Studio
I want the most control: llama.cpp
I want a local API: Ollama
I want a phone-first experience: AI Edge Gallery

Recommended order

Identify your device class and memory budget.
Use the model picker to narrow the Gemma 4 size.
Choose the easiest runtime that supports that hardware well.
If it fails, go straight to troubleshooting instead of trying random downloads.

What “offline” actually means here

Running Gemma 4 locally means inference is happening on your device. It does not mean the model has live web access. The official Gemma 4 model card lists the training cutoff as January 2025.

Next: pick a model size Use E2B, E4B, 26B A4B, or 31B based on hardware and intent.