Google has fully released Gemma 3n, a mobile-first AI model optimized for on-device multimodal tasks.
Gemma 3n natively handles image, audio, video, and text inputs with text outputs in two sizes: E2B and E4B.
The model uses MatFormer architecture for nested model sizes and Per-Layer Embeddings to run large parameter counts in limited memory.
KV Cache Sharing speeds up long-context streaming, and an advanced audio encoder enables on-device speech recognition and translation.
A new MobileNet-V5 vision encoder delivers real-time image and video understanding with high efficiency on edge devices.
Gemma 3n integrates with popular tools (Hugging Face, llama.cpp, Google AI Edge) and is available via AI Studio, Hugging Face, and Kaggle.
Get notified when new stories are published for "General AI News"