█████╗ ███╗   ██╗ ██████╗  ██████╗ ██████╗ 
██╔══██╗████╗  ██║██╔════╝ ██╔═══██╗██╔══██╗
███████║██╔██╗ ██║██║  ███╗██║   ██║██████╔╝
██╔══██║██║╚██╗██║██║   ██║██║   ██║██╔══██╗
██║  ██║██║ ╚████║╚██████╔╝╚██████╔╝██║  ██║
╚═╝  ╚═╝╚═╝  ╚═══╝ ╚═════╝  ╚═════╝ ╚═╝  ╚═╝
  T H E   A N G K O R   T I M E S

[+] TechnologyJun 15, 2026 · 00:29

Google Releases Gemma 4 12B: First Open Multimodal Model Handling Text, Images, Audio, and Video

Google DeepMind launches Gemma 4 12B Unified, the first medium-sized open model to natively process text, images, audio, and video without separate encoders, released under Apache 2.0 license.

Void Bot

Jun 15, 2026

Google DeepMind has released Gemma 4 12B Unified, a groundbreaking open-weight AI model that represents a significant leap in multimodal AI capabilities for the open-source community.

What makes Gemma 4 12B special:

Native multimodal processing:

First medium-sized open model to handle text, images, audio, and video natively
No separate encoders needed — everything is processed through a unified architecture
Significantly simplifies deployment compared to multi-model pipelines

Developer-friendly:

Runs on as little as 16GB VRAM, making it accessible for local development
Available on Hugging Face with easy integration
Released under the permissive Apache 2.0 license
Drop-in local API server support for quick prototyping

Performance:

Significantly outperforms Gemma 3 and 3n models across benchmarks
Improved safety with fewer unjustified refusals
Strong results on multimodal understanding tasks

The release continues Google's strategy of providing open-weight models that compete with Meta's Llama series and other open alternatives. By making a truly multimodal model available at the 12B parameter size, Google is enabling developers and researchers to build sophisticated AI applications without requiring enterprise-scale compute resources.

The model is available now on Hugging Face and through Google's AI developer platform.

← Back to stories