Veo 3展示了在眾多視覺任務中的零樣本能力,表明影片模型正走在成為視覺基礎模型的道路上——就像大語言模型成為語言基礎模型一樣。
大語言模型(LLM)的卓越零樣本能力已將自然語言處理從特定任務模型推向統一的通用基礎模型。這種轉變源於簡單的基本要素:在網路規模資料上訓練的大型生成模型。有趣的是,同樣的基本要素也適用於當今的生成式影片模型。影片模型會像大語言模型發展出通用語言理解能力一樣,走向通用視覺理解嗎?
這項研究表明,Veo 3可以零樣本解決大量它未經明確訓練的任務:分割物件、檢測邊緣、編輯圖像、理解物理屬性、識別物件功能、模擬工具使用等等。這些感知、建模和操縱視覺世界的能力,使其能夠進行迷宮求解和對稱性求解等早期形式的視覺推理。Veo 3的新興零樣本能力表明,影片模型正走在成為統一通用視覺基礎模型的道路上。
收聽研究論文的生成摘要。
Edge detection
Segmentation
Keypoint localization
Super-resolution
Blind deblurring
Blind denoising
Low-light enhancement
Conjunctive search
Dalmatian illusion understanding
Shape cue-conflict understanding
Rorschach blot interpretation
Material properties (flammability)
Rigid body transform
Soft body transform
Gravity (earth)
Gravity (moon)
Buoyancy (bottle cap)
Buoyancy (rock)
Visual Jenga
Object packing
Material optics (glass)
Material optics (mirror)
Color mixing (additive)
Color mixing (subtractive)
Categorizing objects
Omniglot (recognition)
Omniglot (generation)
Omniglot (parsing)
Memory of world states
Background removal
Style transfer
Colorization
Inpainting
Outpainting
Text manipulation
Image editing with doodles
Scene composition
Novel view synthesis
3D-aware reposing
Transfiguration
Professional headshot
Dexterous manipulation (jar)
Dexterous manipulation (throw/catch)
Dexterous manipulation (baoding balls)
Affordance recognition
Drawing
Visual instruction
Graph traversal
Tree BFS
Sequence (dots)
Sequence (arrows)
Sequence (circles)
Sequence (squares)
Connecting colors
Shape fitting
Sorting numbers
Tool use
Simple sudoku completion
Water puzzle solving
Maze solving (mouse)
Robot navigation
Rule extrapolation
Analogy (color)
Analogy (resize)
Analogy (reflect)
Analogy (rotate)
Maze (5x5)
Maze (7x7)
Maze (9x9)
Maze (irregular)
Symmetry (shape)
Symmetry (random)
Monocular depth estimation
Monocular surface normal estimation
Force prompting
Motion trajectory prompting
Tying the knot
Connect the path puzzle
Letter word search
Eulerian path
Solving linear equations
Spot the difference
Visual IQ test
Glass falling
Collisions
Jigsaw puzzle
Sliding puzzle
Scrambled puzzle
Bottleneck
Laundry folding
Motion planning