Gemma 4 Model Card

Official model card for Gemma 4, Google DeepMind’s open multimodal language model family. Documents four model sizes (E2B, E4B, 26B A4B MoE, 31B dense), architectural innovations, benchmark performance, training data, and safety evaluations.

Key contributions

Architectural innovations: Per-Layer Embeddings (PLE) for on-device efficiency, hybrid attention (local + global), Mixture-of-Experts with 128 experts, variable image resolution, native audio processing (E2B/E4B).

Reasoning capability: Built-in thinking mode with configurable thinking tokens, step-by-step reasoning before answering. Native system prompt support.

Multimodal design: Text, image (variable aspect ratio and resolution), video (frame sequences), and audio (ASR/AST, 30s max). Interleaved multimodal input supported.

Context windows: 128K tokens (E2B/E4B), 256K tokens (26B A4B/31B). Proportional RoPE (p-RoPE) for long-context optimization.

Function calling: Native support for structured tool use, enabling agentic workflows.

Deployment range: High-end phones to servers. Dense models (E2B: 2.3B effective params, E4B: 4.5B, 31B: 30.7B) and MoE (26B total, 3.8B active).

Benchmark highlights

Gemma 4 31B achieves 85.2% on MMLU Pro, 89.2% on AIME 2026, 80.0% on LiveCodeBench v6, and Codeforces ELO 2150. The 26B A4B MoE runs nearly as fast as a 4B model while approaching 31B performance. Smaller models (E2B/E4B) demonstrate strong on-device capabilities with audio support.

Training and safety

Training dataset: web documents (140+ languages), code, mathematics, images, with January 2025 cutoff. CSAM filtering and sensitive data removal at multiple stages. Safety evaluations aligned with Google AI Principles show major improvements over Gemma 3, with minimal policy violations across text-to-text and image-to-text tasks.

License and availability

Apache 2.0 license. Available via Hugging Face Transformers, compatible with vLLM, SGLang, and standard inference stacks.

Ingestion manifest

MOC updated: AI Agents