Runway, one of the leading artificial intelligence video generation services, has added a new text-to-speech feature to its platform. This allows users to create voiceovers for projects and select from several realistic-sounding, but synthetic voices.
The company was founded in 2018, releasing the first publicly available, commercially licensed video-to-video model early in 2023. Known as Gen-1 it was accessible through Discord and recreated video clips using artificial intelligence.
With Gen-2 came the ability to turn images and text into video and a new web platform. The latest addition is a text-to-voice tool that can create multiple voices.
I tried it out and was genuinely impressed with how natural and varied the voices were. This is the type of advancement actors were concerned about during the recent SAG-AFTRA strike. The realism was surprising.
Creating a voiceover with Runway
Accessing the audio tool isn't particularly obvious. It is under the video menu with the title Generate Audio. I imagine future versions of Runway's editor will include easier access to generate a voiceover. For now, it is a standalone tool.
There are a number of audio services available including removing silence from an existing clip, cleaning up background noise, and of course generating speech from text.
To test how well it works I created a short video made using Gen-2 and with images I recently generated using MidJourney version six. I had ChatGPT write a brief script featuring two characters and used the voiceover tool to turn the script into sound.
How easy is it to use?
Very easy, if a little clunky. Each clip you generate using Runway appears on the right-hand side of the screen. The text input is on the left, as is the selection of voices. It doesn’t have the same ability to clone your own voice or select from a broad library of voices as ElevenLabs does, but the quality is the same.
For this project, I had two characters, a soldier and an officer in the human Martian army as they battle against humans from Earth sent to end the Martian fight for independence.
I was able to enter the words I wanted each character to speak, generate using that voice and have it appear as a playable and downloadable sample on the right. You could also generate all of the lines for a character once then cut it up in an editor later.
What is the sound like?
I found the sound was better than expected. Sometimes AI voice tools struggle with emphasis and emotion. While it wasn’t perfect it did capture pauses in the right places and was considerably more natural than I expected. Especially when paired with video or sound FX.
If this is where we've come in a few months, I'd be very worried if I were a voice actor working in radio or games.