Runway, one of the leading artificial intelligence video generation platforms, has added a new lip sync feature to its growing toolkit and made it available to all users.
Being able to animate the mouth and face movements in time to an audio clip brings an entirely new dimension to AI video, giving characters a voice and making them interactive.
Pika Labs launched its lip sync feature earlier this month and both tools utilize the impressively natural sounding synthetic voices from ElevenLabs.
The new implementation from Runway also allows you to clone your own voice from within the interface using ElevenLabs technology. It builds on previous audio generation options and is available under the Generative Audio section of the Runway interface for all users.
Testing Runway’s lip sync feature
Lip sync is an important feature for AI video to become more mainstream, but for it to work it can’t just animate the mouth movement, it needs to animate the face and this does just that.
To put it to the test I created a series of characters facing the camera using Leonardo, which is built on Stable Diffusion technology. I also used the new Runway feature with a character I made using MidJourney's consistent character feature.
I then ran the Leonardo images through the Runway Generative Voice tool with a random script and closest matching synthetic voice. For the MidJourney image I recorded a sample of my own voice to see how it handled real sound.
The firefighter
The first character I tested was an intensive looking firefighter in full uniform with a blurred background. Picking a default female American voice, I had her talk about the world being on fire and a need to run.
I've been genuinely impressed with how well Runway is able to stick not just to the lip outline during its animation, but also move the head in time to what is being said.
The ancient expert
In Civilization 2 there was a High Council of advisors. It was basically pre-recorded videos of people talking to camera in period costumes with lines about the game you're playing such as "we need more soldiers".
I created a character inspired by this concept and had her talk about adventure. I think the voice I selected was softer but I think the tool did a good job animated a more CGI-style depiction of a character.
The old man
Age is just a number but as we age our faces change, become more weathered and lived in — and our voices often get deeper. I picked this near-realistic depiction of an old man sitting in a chair leaning into the camera.
This was one of the least realistic lip animations with teeth peeping through in slightly uncanny valley ways. But it still looks pretty good for AI generated content.
A distance discussion
This was the hardest test for the AI. It had to interpret a face further away from the camera perspective and not directly facing the camera.
Runway didn't do a terrible job; it even attempted some head movement animation but the uncanny valley is higher in this than any of my other experiments. Only half the mouth is animated.
Using my own voice
For this test I used a photorealistic image of a character I created in MidJourney and then used a recording of my own voice for the lip sync.
This was a double hitter test as it also put Runway's ability to hand a beard through its paces and it struggled but not as much as other AI lip syncing I've tried. It handled the voice recording as well as it handled text-to-voice and synthetic voice.
Animating a video
All of the other experiments used an image as the starting point, relying entirely on Runway to add any movement to stop it feeling static. This test starts with a video.
It was created using Leonardo and Motion, showing an older man staring into the distance as if he's seen something wonderful. Runway not only animated the mouth properly but fixed an issue with the mouth in the original generation and synced to the voice I selected. This may have been the most impressive of the tests.
The action figure test
Finally, I put Runway to the test animating the mouth of an action figure. I ran a similar test on Pika Labs when it rolled out lip sync and it did a good job, properly adding lip movement expression.
In this instance I don't think Runway did as well as Pika Labs. While the lips were more natural and realistic (for a plastic toy) and it moved the head — that actually made it less useful for this type of character.
Bottom Line: Is Runway lip sync any good?
Lip sync and AI video is a relatively new, but important field. It gives characters much more ... character than is otherwise possible, and allows conversation. It also still feels very new.
Runway's approach is very impressive and much more realistic than similar models I've tried. It is actually closer to the more expensive and complex digital human lip syncing from Nvidia usually reserved for use by developers.
It also allows you to create longer monologues or pieces of dialogue than other tools and with the head movement animation — it feels more realistic.