Gemma 4 Ollama

The fastest practical local path for many Gemma 4 users, with the baseline commands and tags you need first.

Apr 3, 2026

Why people search this

For many users, Ollama is the fastest honest answer to “how do I run Gemma 4 locally without building an entire stack first?”

Baseline commands

ollama --version
ollama pull gemma4
ollama list
ollama run gemma4 "roses are red"

The official Google AI for Developers integration page also lists the available tags:

gemma4:e2b
gemma4:e4b
gemma4:26b
gemma4:31b

When Ollama is the right answer

You want a CLI-first start.
You are testing prompts and basic local workflows.
You want the fastest path from search result to real local output.

When to pick something else

Use LM Studio if you want a desktop UI.
Use MLX if you are specifically optimizing for Apple silicon workflows.
Use vLLM if your real goal is production serving.

Local API (localhost:11434)

# simple text generation
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "roses are red"
}'

# multimodal caption example (image path or base64 array)
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "caption this image",
  "images": ["/path/to/image.png"]
}'

Notes and caveats

Ollama packages models in the GGUF format; these are quantized variants intended to reduce compute/memory requirements.
Quantization reduces resource usage but can slightly lower output quality versus full‑precision weights.
No model is bundled with Ollama by default — you must ollama pull a tag before running.
Updated: 2026‑04‑02 UTC (per Google AI dev docs at the time of writing).