Gemma 4 Ollama

The fastest practical local path for many Gemma 4 users, with the baseline commands and tags you need first.
Apr 3, 2026

Why people search this

For many users, Ollama is the fastest honest answer to “how do I run Gemma 4 locally without building an entire stack first?”

Baseline commands

ollama --version
ollama pull gemma4
ollama list
ollama run gemma4 "roses are red"

The official Google AI for Developers integration page also lists the available tags:

  • gemma4:e2b
  • gemma4:e4b
  • gemma4:26b
  • gemma4:31b

When Ollama is the right answer

  • You want a CLI-first start.
  • You are testing prompts and basic local workflows.
  • You want the fastest path from search result to real local output.

When to pick something else

  • Use LM Studio if you want a desktop UI.
  • Use MLX if you are specifically optimizing for Apple silicon workflows.
  • Use vLLM if your real goal is production serving.

Local API (localhost:11434)

# simple text generation
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "roses are red"
}'

# multimodal caption example (image path or base64 array)
curl http://localhost:11434/api/generate -d '{
  "model": "gemma4",
  "prompt": "caption this image",
  "images": ["/path/to/image.png"]
}'

Notes and caveats

  • Ollama packages models in the GGUF format; these are quantized variants intended to reduce compute/memory requirements.
  • Quantization reduces resource usage but can slightly lower output quality versus full‑precision weights.
  • No model is bundled with Ollama by default — you must ollama pull a tag before running.
  • Updated: 2026‑04‑02 UTC (per Google AI dev docs at the time of writing).

Official references