Full release of Gemma 3n, a mobile-first multimodal model for on-device AI with image, audio, video, and text support
Available in two efficient sizes: E2B (5B raw, 2B effective) requiring 2GB memory and E4B (8B raw, 4B effective) requiring 3GB, enabled by Per-Layer Embeddings
Novel MatFormer architecture provides nested models, custom sizing via Mix-n-Match, and paves way for elastic inference
KV Cache Sharing doubles prefill performance for long-context streaming applications
Integrated audio encoder based on Universal Speech Model enables on-device speech-to-text and speech translation
New MobileNet-V5-300M vision encoder delivers state-of-the-art multimodal performance with high throughput and low memory footprint
Wide support across major tools and platforms and launch of Gemma 3n Impact Challenge with $150,000 in prizes
Get notified when new stories are published for "🇺🇸 Hacker News English"