Alibaba has unveiled the Wan2.6 series, the latest evolution of its visual generation models, designed to put creators directly into AI-generated videos, complete with their own appearance and voice. With flexible multi-shot storytelling, enhanced multi-person dialogue, and extended durations, the new series opens up fresh creative possibilities for professional-grade content production.

At the heart of the series is Wan2.6-R2V, a reference-to-video generation model that allows users to upload a character reference video, capturing both appearance and voice, and then generate new scenes starring that same character through simple text prompts. Creators can now bring people, animals, objects, or multiple subjects to life in AI videos while maintaining the distinct look and sound of the original reference.
Powered by multimodal reference generation capabilities, Wan2.6-R2V is China’s first reference-to-video model that ensures visual and audio consistency, enabling short-form drama creators and storytellers to streamline production while delivering richer, immersive narratives.
The Wan2.6 series also brings upgrades to Alibaba’s other models, including:
- Wan2.6-T2V: Enhanced text-to-video generation
- Wan2.6-I2V: Improved image-to-video capabilities
- Wan2.6-image & Wan2.6-T2I: Upgraded image generation models
The series introduces intelligent multi-shot storytelling, allowing for more expressive narratives with consistent visuals across scenes. Additionally, improvements in audio-visual synchronization and audio-to-video generation create realistic soundscapes that elevate the immersive quality of AI-generated content.




