Stable Diffusion creator adds video to its generative…

Stable Diffusion creator adds video to its generative AI model — here's what it can do

StabilityAI says any image can be turned into a video.

StabilityAI, the company behind the Stable Diffusion artificial intelligence image generator has added video to its playbook.

The new model is built on top of its existing image tool and will allow users to any image into a video at the press of a button. Currently, it's only a research preview and not available for commercial use but StabilityAI says this early release is perfect for hobbyists and education purposes.

The terms and conditions ban creators from using it to produce content that passes itself off as a representation of people or events — no deep fakes here.

What can it do?

Like the early versions of Runway’s video generation tools, Stable Video Diffusion (SVD) is image-to-video, so you need a starting image to kick things off. Runway also has a text-to-video function as will Meta’s new Emu Video when it's released. SVD was trained on a dataset of millions of videos and then fine-tuned for accuracy on a smaller selection of labeled clips. The source of the training data is likely a public research library of videos, which also explains the non-commercial license.

The demonstration videos seem to show that it is capable of producing near, but not perfect, photorealistic short video clips at high-definition resolution. The research paper says it can generate 25 frames per second at 576 x 1024.

Is it as good as it sounds?

This version also has several limitations. It can only produce four-second clips in its initial incarnation, although that is the same as Runway.

According to StabilityAI this new model is unable to generate vide clips from a text prompt. It only works when given an image as a starting point. Its bigger issues come from how you might want to use it. For example it might produce very slow camera pans or no motion at all.

However, it could be adapted in the future to offer 360 views of an object within a video, allowing for full panning. The company is also working on text-to-video versions that would allow users to create a video from a simple line of text.

The goal is likely to license the model to companies for inclusion in other products such as video editors, advertising tools, and even education for teachers to create more interactive lessons.

More from Tom's Guide

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here