Qwen VLo: Unified Multimodal Model

Qwen VLo can both understand image content and generate or edit images with high semantic consistency.

The model supports open-ended natural language instructions for tasks like style transfer, scene reconstruction, and detection or segmentation.

Multilingual instruction support allows users to interact in languages such as Chinese and English.

The model uses a progressive image generation mechanism that builds images top-to-bottom and left-to-right for better control.

It supports dynamic resolution inputs and outputs, handling arbitrary resolutions and aspect ratios.

As a preview version, Qwen VLo may exhibit inaccuracies, inconsistencies, and instability in instruction compliance.

Get notified when new stories are published for "General AI News"

No Sign-In needed. One-Click Subscribe.

•

General AI News•June 27, 2025 at 07:43 PM

Qwen VLo can both understand image content and generate or edit images with high semantic consistency.

The model supports open-ended natural language instructions for tasks like style transfer, scene reconstruction, and detection or segmentation.

Multilingual instruction support allows users to interact in languages such as Chinese and English.

The model uses a progressive image generation mechanism that builds images top-to-bottom and left-to-right for better control.

It supports dynamic resolution inputs and outputs, handling arbitrary resolutions and aspect ratios.

As a preview version, Qwen VLo may exhibit inaccuracies, inconsistencies, and instability in instruction compliance.

Get notified when new stories are published for "General AI News"

No Sign-In needed. One-Click Subscribe.