OpenAI keeps teasing the capabilities of its Sora generative video model and the latest clips are getting closer to a Hollywood production than any we’ve seen from AI to date — and all from a single prompt.
Sora isn’t available for anyone outside of OpenAI (and a select group of testers) but we are getting an insight into what is possible as they share the output on social media.
In the first round of video releases we saw scenes of dogs playing in the snow, a couple in Tokyo and a flyover of a gold mining town in 19th century California.
We are now seeing clips from a single prompt that look like complete productions with multiple shots, effects and consistent motion across videos of up to a minute.
What are some of the new clips?
"fly through tour of a museum with many paintings and sculptures and beautiful works of art in all styles"Video generated by #Sora pic.twitter.com/SNr9dQZe5VMarch 2, 2024
The clips we've seen hint at the future of true generative entertainment. When combined with other AI models for sound, lip syncing or even production level platforms like LTX Studio — creativity becomes truly accessible.
Blaine Brown, a creator on X shared a video that combined the Sora alien by Bill Peebles with Pika Labs Lip Sync and a song created using Suno AI to make a music video.
The fly-through of the museum by Tim Brooks is impressive for the variety of shots and motion flow it achieves — appearing like a drone video but indoors.
Others like a couple having a meal in a glorified fish tank show its capabilities with complex motion, keeping consistent flow across the full clip.
How does Sora compare?
This Sora clip is 🔥 when the alien guy busts out in a lip-synced rap about how tough it is being different than everyone else. Workflow in the thread.@suno_ai_ @pika_labs (lip sync)Alienate Yourself 🆙🔊🔊 pic.twitter.com/kc5FI83q5RMarch 3, 2024
Sora is a significant moment in AI video. It utilizes a combination of the transformer technology in chatbots like ChatGPT and the image generation diffusion models found in MidJourney, Stable Diffusion and DALL-E.
Right now it can do things not possible with any of the other big AI video models like Runway's Gen-2, Pika Labs Pika 1.0 or StabilityAI's Stable Video Diffusion 1.1.
At the moment the available AI video tools create clips of between 1 and 4 seconds, sometimes struggle with complex motion but the realism is nearly as good as Sora.
However, other AI companies are taking note of what Sora can do and how it has been produced. StabilityAI confirmed Stable Diffusion 3 will following a similar architecture and we are likely to see a video model eventually.
Runway has already made tweaks to its Gen-2 model and we're seeing much more consistent motion and character development, and Pika unveiled Lip Sync as a standout feature to bring more realism to characters.