Veo 3展示了在众多视觉任务中的零样本能力,表明视频模型正走在成为视觉基础模型的道路上——就像大语言模型成为语言基础模型一样。
大语言模型(LLM)的卓越零样本能力已将自然语言处理从特定任务模型推向统一的通用基础模型。这种转变源于简单的基本要素:在网络规模数据上训练的大型生成模型。有趣的是,同样的基本要素也适用于当今的生成式视频模型。视频模型会像大语言模型发展出通用语言理解能力一样,走向通用视觉理解吗?
这项研究表明,Veo 3可以零样本解决大量它未经明确训练的任务:分割对象、检测边缘、编辑图像、理解物理属性、识别对象功能、模拟工具使用等等。这些感知、建模和操纵视觉世界的能力,使其能够进行迷宫求解和对称性求解等早期形式的视觉推理。Veo 3的新兴零样本能力表明,视频模型正走在成为统一通用视觉基础模型的道路上。
收听研究论文的生成摘要。
Edge detection
Segmentation
Keypoint localization
Super-resolution
Blind deblurring
Blind denoising
Low-light enhancement
Conjunctive search
Dalmatian illusion understanding
Shape cue-conflict understanding
Rorschach blot interpretation
Material properties (flammability)
Rigid body transform
Soft body transform
Gravity (earth)
Gravity (moon)
Buoyancy (bottle cap)
Buoyancy (rock)
Visual Jenga
Object packing
Material optics (glass)
Material optics (mirror)
Color mixing (additive)
Color mixing (subtractive)
Categorizing objects
Omniglot (recognition)
Omniglot (generation)
Omniglot (parsing)
Memory of world states
Background removal
Style transfer
Colorization
Inpainting
Outpainting
Text manipulation
Image editing with doodles
Scene composition
Novel view synthesis
3D-aware reposing
Transfiguration
Professional headshot
Dexterous manipulation (jar)
Dexterous manipulation (throw/catch)
Dexterous manipulation (baoding balls)
Affordance recognition
Drawing
Visual instruction
Graph traversal
Tree BFS
Sequence (dots)
Sequence (arrows)
Sequence (circles)
Sequence (squares)
Connecting colors
Shape fitting
Sorting numbers
Tool use
Simple sudoku completion
Water puzzle solving
Maze solving (mouse)
Robot navigation
Rule extrapolation
Analogy (color)
Analogy (resize)
Analogy (reflect)
Analogy (rotate)
Maze (5x5)
Maze (7x7)
Maze (9x9)
Maze (irregular)
Symmetry (shape)
Symmetry (random)
Monocular depth estimation
Monocular surface normal estimation
Force prompting
Motion trajectory prompting
Tying the knot
Connect the path puzzle
Letter word search
Eulerian path
Solving linear equations
Spot the difference
Visual IQ test
Glass falling
Collisions
Jigsaw puzzle
Sliding puzzle
Scrambled puzzle
Bottleneck
Laundry folding
Motion planning