ChatGPT may be stepping up its game with the release of GPT4 but it has been beaten to the punch in the race to create AI-generated video.
New York startup Runway Research has announced its new Gen 2 system that can produce 3-second looping video content from either text or image prompts, and although not publicly available right now, a promotional video promises its arrival ”very soon," and you can sign up to the waiting list now.
The previews on Runway’s own site are interesting. The videos resemble super-charged GIFs, but whatever you can imagine you can create. Both Google’s Bard AI and ChatGPT have demonstrated text-to-video creations but neither has gone beyond the testing phase. Some had been expecting GPT4 to launch with video capabilities but while it can process and edit images, there is no video functionality currently.
Primarily a video and image editing service, Runway’s AI can also layer different textures or effects on top of an existing video and match it frame by frame. This opens up all kinds of possibilities and lets you, say, turn an existing video into a cartoon. A video example on Runway’s own website shows Dalmatian-style spots being transposed onto a golden retriever.
How does Runway AI video work?
In a research paper titled “Structure and Content-Guided Video Synthesis with Diffusion Models” Runway outlines how the technology works. It’s fairly heavy going but in their own words they describe it like so:
“Our latent video diffusion model synthesizes new videos given structure and content information. We ensure structural consistency by conditioning on depth estimates while content is controlled with images or natural language. Temporally stable results are achieved with additional temporal connections in the model and joint image and video training. Furthermore, a novel guidance method, inspired by classifier-free guidance, allows for user control over temporal consistency in outputs.”
In layman's terms, Runway uses content-aware video diffusion and a model that has been trained on a large pool of uncaptioned videos and text-image data to provide context to the AI.
Perhaps more so than AI chatbots, the potential for AI-generated video to cause harm is very real. While deepfake videos and images are becoming more commonplace there is still an implied authenticity around video content that we will need to start reconsidering. With any new technology, there will always be upsetting content as well and there will need to be filters put in place.
We'll keep an eye on Runway and will test it out once it's ready to launch.