Google has launched a new artificial intelligence video model called Lumiere that it claims can create consistent, smooth and realistic movement across a full video clip.
Many of the existing AI video models struggle with consistency of movement, and even if they do manage to capture a natural walk, other elements will be choppy or merge into scenery.
Lumiere takes a different approach to generating video. Instead of putting together individual frames it create the entire video in one process by handling both the placement of objects and their movement simultaneously.
While the preview clips look impressive, it isn’t available to try yourself as this is just a research project. However, the underlying technology and approach to AI video could find itself integrated into a future Google product and would be a major player in the space.
How does Lumiere work?
Lumier works across text-to-video and image-to-video, offering stylized generation from a reference image to fine-tune exactly how an element within the video will look. Some of this is already possible with the Runway and Pika Labs models.
This AI model is built on a space-time architecture, and while this sounds like something out of a science fiction movie, in reality it means it considers all aspects of motion and location.
During its generation process the model examines where things should be placed, or the “space” aspect of the clip, as well as when and how things move, or the “time” element. It does both aspects at the same time on a single run-through to create consistent motion.
The researchers wrote in a preprint paper on the mode: “Our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales.”
What else can Lumiere do?
When generative AI video first started to appear its primary focus was making a short video clip, but as the technology matures other features are starting to appear. Runway offers the ability to highlight different regions of an image and have them animated independently.
The Google Research team say Lumier achieves “state-of-the-art text-to-video generation results” and “facilitates a wide range of content creation tasks and video editing applications.”
As well as the promise of smoother motion, they say it can also animate specific regions of an image with relative ease, and offer inpainting capaibilities such as changing the style of clothing or type of animal featured within a frame.
Are we ever likely to see Lumiere in the real world?
A lot of the research projects put out by companies like Google, Microsoft and Meta don’t see the light of day in their preview form. However, the underlying technology finds itself incorporated into branded products.
This isn't even the first AI video tool from Google. It has a video version of its Imagen model which powers AI image generation in Google Cloud, and VideoPoet is a large language model for zero-shot video generation.
Video Poet also created audio from a video clip without requiring text as a guide. Google says the Video Poet model can also continuously generate one second extensions to produce a video of any duration with strong object identity. This is also not currently available to the public.
The answer to the question of whether we will see Lumiere in the real world comes down to how well it is received by researchers and whether Google has a project worthy of its inclusion. It may be that like Imagen, it is largely reserved for third-party developers using Google Cloud.