Tongyi Wanxiang AI Video Generation

Wan 2.5 Preview

A revolution in multisensory storytelling. Integrating native audio with cinematic-grade visual control, redefining the boundaries of AI video creation.

Generational Leap in Capabilities

Wan 2.5 integrates the essence of previous models while achieving qualitative breakthroughs in key dimensions.

Multisensory Storytelling

First-time implementation of synchronized audio-video processing, providing native narration, precise lip-sync, and immersive environmental sound effects.

Cinematic 4K Quality

Supports up to 4K resolution output, presenting photo-realistic faces, skin textures, and clothing details that meet professional production standards.

Precise Cinematic Control

Provides advanced camera controls including pan, zoom, and focus switching, allowing creators to 'direct' scenes rather than just 'describe' them.

Extended Narrative Duration

Supports generating video clips up to 10+ seconds, sufficient to form a complete narrative rhythm or a short advertisement.

Evolution Path: From Open Source to Peak

Wan 2.5 stands on the shoulders of giants, representing the inevitable result of technical iteration and strategic evolution.

Wan 2.1 / 2.2

Open Source Foundation

Established community leadership and popularized high-performance video generation.


MoE Architecture Revolution

Introduced Mixture-of-Experts architecture, achieving scalable model performance.

Wan 2.5 Preview

Capability Integration

Integrates audio, animation, and advanced control into a unified model.


Commercial API

Shifts to high-end professional market, providing closed-source API services.

Reshaping Market Structure

The release of Wan 2.5 marks the generative video market entering a new era of three-tier structure.

Tier 1: High-End Closed Source

Industry Benchmark

Flagship models provided by top laboratories (OpenAI, Google, Alibaba) through API access, pursuing highest quality and strongest control.

Representatives: Sora, Veo, Wan 2.5

Tier 2: Legacy Open Source

Community Mainstay

High-quality but one generation behind open-source models, serving as the core for community experimentation, learning, and non-commercial projects.

Representatives: Wan 2.2, Stable Video Diffusion

Tier 3: Independent Open Source

Innovation Pioneers

Community-driven small or specialized models providing unique features or optimized for specific hardware, serving as the source of ecosystem diversity.

Representatives: Community Models

Wan Model Series Features and Architecture Comparison

The table below intuitively demonstrates the complete evolution path of the Wan model series from open accessibility to professional commercialization by comparing core architecture, key innovations, and release models.

Core ArchitectureWan 2.1Wan 2.2Wan 2.5 Preview (Announced/Speculated)
Core ArchitectureStandard Diffusion TransformerMixture-of-Experts (MoE) (High/Low Noise)Evolved MoE Architecture
Model Scale1.3B and 14B parameters14B active / 27B total parametersPossibly >30B total parameters
Key InnovationOpen source accessibility and efficiencyMoE achieves scalable performanceIntegrated multimodal (audio-video)
Maximum Resolution720p (unstable), 480p (recommended)720p / 1080p4K (claimed), 1080p (API confirmed)
Maximum Duration~3-5 seconds~5 seconds10+ seconds
Core ModalityT2V, I2V, video editingT2V, I2V, and dedicated S2V and Animate modelsUnified T2V, I2V, audio-video sync, advanced animation
Cinematic ControlBasic"Cinematic aesthetic control"Precise camera, lighting, and scene control
Release ModelOpen source (Apache 2.0)Open source (Apache 2.0)API only (closed source)