Wan2.1 is a powerful series of open-source video generation models from Alibaba.
The series includes:
Model Type | Resolution | VRAM (approx.) |
---|---|---|
Text-to-Video 14B (T2V) | 480P / 720P | ~40GB |
Text-to-Video 1.3B (T2V) | 480P | ~8–15GB |
Image-to-Video 14B (I2V) | 480P / 720P | ~40GB |
Visual Text Generation | Multilingual (Chinese/English) | Variable |
File Description | Filename (Click to download) | Target Folder |
---|---|---|
Text Encoder | umt5_xxl_fp8_e4m3fn_scaled.safetensors | ComfyUI/models/text_encoders/ |
VAE | wan_2.1_vae.safetensors | ComfyUI/models/vae/ |
CLIP Vision (for Image-to-Video) | clip_vision_h.safetensors | ComfyUI/models/clip_vision/ |
Video Model (Diffusion Model) | Select from this directory table2_row4_col2_suffix | ComfyUI/models/diffusion_models/ |
Video Model Recommendation:
fp16
> bf16
> fp8_scaled
> fp8_e4m3fn
。ComfyUI provides JSON-based workflows. You can find these JSON files in the official ComfyUI examples or documentation. Here are GIF demonstrations of some workflows:
This workflow can be used with the 1.3B or 14B models. For example, use:
wan2.1_t2v_1.3B_fp16.safetensors
(Place in ComfyUI/models/diffusion_models/
)Output: 480p / 720p (depends on the selected model and settings)
Runtime: Generating a 5-second 480p video with an RTX 4090 takes about 4 minutes.
Workflow Example (1.3B 480p):
Workflow Example (14B 720p):
JSON Workflow File:text_to_video_wan.json
This workflow requires the following files:
wan2.1_i2v_480p_14B_fp16.safetensors
(Place in ComfyUI/models/diffusion_models/
)wan2.1_i2v_720p_14B_fp16.safetensors
(Place in ComfyUI/models/diffusion_models/
)clip_vision_h.safetensors
(Place in ComfyUI/models/clip_vision/
)Output: 480p (default example: 33 frames @ 512x512) or 720p (if VRAM and hardware allow).
Workflow Example (14B 480p):
Workflow Example (14B 720p):
JSON Workflow File:image_to_video_wan_example.json
umt5_xxl_fp8_e4m3fn_scaled.safetensors
)。umt5_xxl_fp8_e4m3fn_scaled.safetensors
, you need about 40GB of VRAM.