最近浏览 AI 相关技术时看到一消息,2 月 16 日 OpenAI 官方称其发布文生视频大模型 Sora:

Sora can create videos of up to 60 seconds featuring highly detailedscenes,complex camera motion,and multiple characters with vibrantemotions.

“60 秒”、“文生视频”。别人还在研究连贯性,它已经达到很高的水平。

看到这个消息时已经过去五日,但搜索“AI”已经有好几篇介绍和分析的文章出现了。

Sora 是什么

目前 OpenAI 尚未开放 Sora 灰度测试,但从目前放出的演示来看,它已经令人惊叹。Sam Altman 在 X 平台疯狂安利,根据用户提供的语句生成视频并回复。

根据 OpenAI 官网 https://openai.com/sora 的介绍,可得知目前 Sora 有以下功能:

  • 可以从文本指令中创建逼真和富有想象力的场景;
  • 可以生成长达一分钟的视频;
  • 能够生成具有多个角色、特定类型的运动以及主题和背景的准确细节的复杂场景;
  • 可以在单个生成的视频中创建多个镜头,以准确保留角色和视觉风格

其不足之处在于:

  • 它可能难以准确模拟复杂场景的物理特性,并且可能无法理解因果关系的具体实例。例如,一个人可能会咬一口饼干,但之后,饼干可能没有咬痕
  • 可能混淆提示的空间细节,例如左右混淆,并且可能难以精确描述随时间推移发生的事件,例如遵循特定的相机轨迹。

Sora 的原理

https://anyubenyu.com/sora-technical-report-mindmap/

Brett Goldstein 在 X 上表示,Sora 的工作流程是让模型从噪点开始扩散,逐渐精细化到所需的视频,之后使用 Diffusion Transformer 架构处理连续的视频帧,确保视频中的动作流畅自然。它是依赖于所谓的“补丁”,将文本转化为事件和时间的快照。

Sora 先将可视化数据转换为补丁,之后通过视频压缩网络进行训练并生成视频;将图像视为单帧视频,通过对视频提取一系列时空补丁,使其能够对可变分辨率、持续时间和纵横比的视频和图像进行训练。Sora 通过 Transformer 架构、扩散模型、缩放转换器、字幕器模型、模拟数字世界等技术,最终生成一个完整的视频。

思维导图详见 https://anyubenyu.com/sora-technical-report-mindmap/

官方演示(部分)

官网部分提示段与生成结果

此处仅展示部分比较经典的画面,使用压缩后的截图,原视频可见官网。

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

一位时尚的女人走在东京的街道上,街道上到处都是温暖的发光霓虹灯和动画城市标志。她身穿黑色皮夹克,红色长裙,黑色靴子,背着一个黑色钱包。她戴着墨镜,涂着红色口红。她走得自信而随意。街道潮湿而反光,营造出五颜六色的灯光的镜面效果。许多行人四处走动。

Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.

海浪拍打大苏尔加雷角海滩崎岖悬崖的无人机视图。汹涌澎湃的蓝色海水掀起白色的波浪,而夕阳的金色光芒照亮了岩石海岸。远处有一个带灯塔的小岛,绿色的灌木丛覆盖着悬崖的边缘。从公路到海滩的陡峭落差是一项戏剧性的壮举,悬崖的边缘突出在海面上。这张照片捕捉到了海岸的原始美景和太平洋海岸公路崎岖的景观。

Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures…

一个华丽的珊瑚礁纸工艺品世界,到处都是五颜六色的鱼和海洋生物……

The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

镜头跟随一辆带有黑色车顶行李架的白色老式 SUV 后面,它在陡峭的山坡上加速行驶在一条被松树环绕的陡峭土路上,轮胎上扬起灰尘,阳光照在 SUV 上,因为它沿着土路飞驰,为场景投下温暖的光芒。土路缓缓蜿蜒向远处,看不到其他汽车或车辆。道路两旁的树木都是红杉,到处都是绿色植物。从后方可以看到这辆车可以轻松地沿着弯道行驶,看起来好像是在崎岖的地形上崎岖不平地行驶。土路本身被陡峭的丘陵和山脉所环绕,上面是湛蓝的天空和轻薄的云彩。

Prompt: Tour of an art gallery with many beautiful works of art in different styles.

参观艺术画廊,欣赏许多不同风格的精美艺术作品。

Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.

美丽、白雪皑皑的东京市熙熙攘攘。镜头穿过熙熙攘攘的城市街道,跟随几个人享受美丽的雪天,并在附近的摊位购物。绚丽的樱花花瓣与雪花一起在风中飘扬。

参考

https://openai.com/sora
https://www.zhihu.com/question/644478200
https://juejin.cn/post/7336756519240908800
https://openai.com/research/video-generation-models-as-world-simulators
https://chatgpt-zhinan.online/sora-technical-report-mindmap/