Tongyi Wanxiang AI Video Generation

Wan 2.5 Preview

A revolution in multisensory storytelling. Integrating native audio with cinematic-grade visual control, redefining the boundaries of AI video creation.

Generational Leap in Capabilities

Wan 2.5 integrates the essence of previous models while achieving qualitative breakthroughs in key dimensions.

Multisensory Storytelling

First-time implementation of synchronized audio-video processing, providing native narration, precise lip-sync, and immersive environmental sound effects.

Cinematic 4K Quality

Supports up to 4K resolution output, presenting photo-realistic faces, skin textures, and clothing details that meet professional production standards.

Precise Cinematic Control

Provides advanced camera controls including pan, zoom, and focus switching, allowing creators to 'direct' scenes rather than just 'describe' them.

Extended Narrative Duration

Supports generating video clips up to 10+ seconds, sufficient to form a complete narrative rhythm or a short advertisement.

Evolution Path: From Open Source to Peak

Wan 2.5 stands on the shoulders of giants, representing the inevitable result of technical iteration and strategic evolution.

Wan 2.1 / 2.2

Open Source Foundation

Established community leadership and popularized high-performance video generation.

MoE Architecture Revolution

Introduced Mixture-of-Experts architecture, achieving scalable model performance.

Wan 2.5 Preview

Capability Integration

Integrates audio, animation, and advanced control into a unified model.

Commercial API

Shifts to high-end professional market, providing closed-source API services.

Reshaping Market Structure

The release of Wan 2.5 marks the generative video market entering a new era of three-tier structure.

Tier 1: High-End Closed Source

Industry Benchmark

Flagship models provided by top laboratories (OpenAI, Google, Alibaba) through API access, pursuing highest quality and strongest control.

Representatives: Sora, Veo, Wan 2.5

Tier 2: Legacy Open Source

Community Mainstay

High-quality but one generation behind open-source models, serving as the core for community experimentation, learning, and non-commercial projects.

Representatives: Wan 2.2, Stable Video Diffusion

Tier 3: Independent Open Source

Innovation Pioneers

Community-driven small or specialized models providing unique features or optimized for specific hardware, serving as the source of ecosystem diversity.

Representatives: Community Models

Wan Model Series Features and Architecture Comparison

The table below intuitively demonstrates the complete evolution path of the Wan model series from open accessibility to professional commercialization by comparing core architecture, key innovations, and release models.

Core Architecture	Wan 2.1	Wan 2.2	Wan 2.5 Preview (Announced/Speculated)
Core Architecture	Standard Diffusion Transformer	Mixture-of-Experts (MoE) (High/Low Noise)	Evolved MoE Architecture
Model Scale	1.3B and 14B parameters	14B active / 27B total parameters	Possibly >30B total parameters
Key Innovation	Open source accessibility and efficiency	MoE achieves scalable performance	Integrated multimodal (audio-video)
Maximum Resolution	720p (unstable), 480p (recommended)	720p / 1080p	4K (claimed), 1080p (API confirmed)
Maximum Duration	~3-5 seconds	~5 seconds	10+ seconds
Core Modality	T2V, I2V, video editing	T2V, I2V, and dedicated S2V and Animate models	Unified T2V, I2V, audio-video sync, advanced animation
Cinematic Control	Basic	"Cinematic aesthetic control"	Precise camera, lighting, and scene control
Release Model	Open source (Apache 2.0)	Open source (Apache 2.0)	API only (closed source)