最近浏览 AI 相关技术时看到一消息,2 月 16 日 OpenAI 官方称其发布文生视频大模型 Sora:

Sora can create videos of up to 60 seconds featuring highly detailedscenes,complex camera motion,and multiple characters with vibrantemotions.

“60 秒”、“文生视频”。别人还在研究连贯性,它已经达到很高的水平。


Sora 是什么

目前 OpenAI 尚未开放 Sora 灰度测试,但从目前放出的演示来看,它已经令人惊叹。Sam Altman 在 X 平台疯狂安利,根据用户提供的语句生成视频并回复。

根据 OpenAI 官网 https://openai.com/sora 的介绍,可得知目前 Sora 有以下功能:

  • 可以从文本指令中创建逼真和富有想象力的场景;
  • 可以生成长达一分钟的视频;
  • 能够生成具有多个角色、特定类型的运动以及主题和背景的准确细节的复杂场景;
  • 可以在单个生成的视频中创建多个镜头,以准确保留角色和视觉风格


  • 它可能难以准确模拟复杂场景的物理特性,并且可能无法理解因果关系的具体实例。例如,一个人可能会咬一口饼干,但之后,饼干可能没有咬痕
  • 可能混淆提示的空间细节,例如左右混淆,并且可能难以精确描述随时间推移发生的事件,例如遵循特定的相机轨迹。

Sora 的原理


Brett Goldstein 在 X 上表示,Sora 的工作流程是让模型从噪点开始扩散,逐渐精细化到所需的视频,之后使用 Diffusion Transformer 架构处理连续的视频帧,确保视频中的动作流畅自然。它是依赖于所谓的“补丁”,将文本转化为事件和时间的快照。

Sora 先将可视化数据转换为补丁,之后通过视频压缩网络进行训练并生成视频;将图像视为单帧视频,通过对视频提取一系列时空补丁,使其能够对可变分辨率、持续时间和纵横比的视频和图像进行训练。Sora 通过 Transformer 架构、扩散模型、缩放转换器、字幕器模型、模拟数字世界等技术,最终生成一个完整的视频。

思维导图详见 https://anyubenyu.com/sora-technical-report-mindmap/




Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.


Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.


Prompt: A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures…


The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.

镜头跟随一辆带有黑色车顶行李架的白色老式 SUV 后面,它在陡峭的山坡上加速行驶在一条被松树环绕的陡峭土路上,轮胎上扬起灰尘,阳光照在 SUV 上,因为它沿着土路飞驰,为场景投下温暖的光芒。土路缓缓蜿蜒向远处,看不到其他汽车或车辆。道路两旁的树木都是红杉,到处都是绿色植物。从后方可以看到这辆车可以轻松地沿着弯道行驶,看起来好像是在崎岖的地形上崎岖不平地行驶。土路本身被陡峭的丘陵和山脉所环绕,上面是湛蓝的天空和轻薄的云彩。

Prompt: Tour of an art gallery with many beautiful works of art in different styles.


Prompt: Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.


