Wan 2.5 Preview
A revolution in multisensory storytelling. Integrating native audio with cinematic-grade visual control, redefining the boundaries of AI video creation.
Generational Leap in Capabilities
Wan 2.5 integrates the essence of previous models while achieving qualitative breakthroughs in key dimensions.
Multisensory Storytelling
First-time implementation of synchronized audio-video processing, providing native narration, precise lip-sync, and immersive environmental sound effects.
Cinematic 4K Quality
Supports up to 4K resolution output, presenting photo-realistic faces, skin textures, and clothing details that meet professional production standards.
Precise Cinematic Control
Provides advanced camera controls including pan, zoom, and focus switching, allowing creators to 'direct' scenes rather than just 'describe' them.
Extended Narrative Duration
Supports generating video clips up to 10+ seconds, sufficient to form a complete narrative rhythm or a short advertisement.
Evolution Path: From Open Source to Peak
Wan 2.5 stands on the shoulders of giants, representing the inevitable result of technical iteration and strategic evolution.
Wan 2.1 / 2.2
Open Source Foundation
Established community leadership and popularized high-performance video generation.
MoE Architecture Revolution
Introduced Mixture-of-Experts architecture, achieving scalable model performance.
Wan 2.5 Preview
Capability Integration
Integrates audio, animation, and advanced control into a unified model.
Commercial API
Shifts to high-end professional market, providing closed-source API services.
Reshaping Market Structure
The release of Wan 2.5 marks the generative video market entering a new era of three-tier structure.
Industry Benchmark
Flagship models provided by top laboratories (OpenAI, Google, Alibaba) through API access, pursuing highest quality and strongest control.
Representatives: Sora, Veo, Wan 2.5
Community Mainstay
High-quality but one generation behind open-source models, serving as the core for community experimentation, learning, and non-commercial projects.
Representatives: Wan 2.2, Stable Video Diffusion
Innovation Pioneers
Community-driven small or specialized models providing unique features or optimized for specific hardware, serving as the source of ecosystem diversity.
Representatives: Community Models
Wan Model Series Features and Architecture Comparison
The table below intuitively demonstrates the complete evolution path of the Wan model series from open accessibility to professional commercialization by comparing core architecture, key innovations, and release models.
Core Architecture | Wan 2.1 | Wan 2.2 | Wan 2.5 Preview (Announced/Speculated) |
---|---|---|---|
Core Architecture | Standard Diffusion Transformer | Mixture-of-Experts (MoE) (High/Low Noise) | Evolved MoE Architecture |
Model Scale | 1.3B and 14B parameters | 14B active / 27B total parameters | Possibly >30B total parameters |
Key Innovation | Open source accessibility and efficiency | MoE achieves scalable performance | Integrated multimodal (audio-video) |
Maximum Resolution | 720p (unstable), 480p (recommended) | 720p / 1080p | 4K (claimed), 1080p (API confirmed) |
Maximum Duration | ~3-5 seconds | ~5 seconds | 10+ seconds |
Core Modality | T2V, I2V, video editing | T2V, I2V, and dedicated S2V and Animate models | Unified T2V, I2V, audio-video sync, advanced animation |
Cinematic Control | Basic | "Cinematic aesthetic control" | Precise camera, lighting, and scene control |
Release Model | Open source (Apache 2.0) | Open source (Apache 2.0) | API only (closed source) |