The thrill that comes from generating text or images with an AI-powered tool is short-lived, but alluring. If there’s any reason AI caught the attention of normal people in 2023, it’s because you shouldn’t be able to type a few sentences and get out paragraphs of text or dozens of images. And yet, now with the right website or app, you can. We’re used to instantly getting what we want online, but not like that.
It also helps that text and image generators have made real strides in the last year in terms of the perceived quality of what they produce. The difference in detail, particularly in the backgrounds of images, between a picture generated by OpenAI’s DALL-E-3 model in comparison to DALL-E-2 is noticeable. Text responses from GPT-4 in comparison to GPT-3 appear to be more accurate and useful.
It would be natural to expect video generation — a potentially even more disruptive AI tool considering the internet’s love of video — would be next, right? Well, if Pika, a new AI video creation and editing tool, is anything to go off of, we still have a while to wait before generative AI videos are ready for their closeup.
Pika 1.0
Pika introduced itself to the public in November 2023 with the news that it had raised $55 million from investors and was making its “Pika 1.0” model available to the public. The goal of the company, according to the press release that currently doubles as its “About” page, is “to enable everyone to be the director of their own stories and to bring out the creator in each of us.” Other companies, like Runway and Stability AI, are looking to create similar models and tools, but Pika is decidedly playful and consumer-oriented. Currently, video production has multiple barriers to entry, from the knowledge required to use production tools to the manpower needed to actually create a video. Pika makes creating a visual simple. It’s “a future interface of video making that is effortless and accessible to everyone.”
What that looks like currently is a page with two tabs and a text field at the bottom for typing in prompts or dropping in image and video files to base clips on. The first tab, called “Explore” is a grid of video clips meant to act as inspiration. You can download these clips, view the prompts used to create them, and edit them right from the same page. The second tab, called “My Library,” shows all of the clips you’ve generated and lets you make edits, re-generate clips based on the same prompt, extend their length (by four seconds at a time), or modify a specific section of a clip using a neat resizable selector tool.
The setup is bare bones — maybe even frustratingly so — but from the outset, I think Pika’s prompt field is the real star. You can type out text instructions just like any other generative AI tool, and Pika will attempt to understand and generate a clip based on your instructions, including taking specific style notes in mind. For example, I can ask for a clip that “looks like a Studio Ghibli movie” or is “3D animated.” You can also load an image or video to build the clip around, asking it to be filtered or animated in a specific way. Once your prompt is set, you can pick an aspect ratio and — unique to Pika from what I can tell — select a camera movement like pan, zoom, and circle to give some extra motion to your clip.
A Lot More Work To Do
The problem is, I don’t think the clips Pika makes are particularly good. Or at the very least, they require a lot of massaging and tweaking to be usable as anything other than a novelty. My issues really begin with the whole framing of Pika itself. It’s true that professional video production isn’t easy, but we’ve had simpler, social video-oriented editing tools like CapCut for years now that have been more than enough for most people. Also, it’s cliché at this point, but almost everyone owns a phone that can shoot acceptable, if not great, video. I don’t need Pika to make video more effortless or accessible.
So if Pika’s solution is more for a lack of creativity or resources to realize out-there ideas, I have issues too. I’m a writer. I studied screenwriting in school. I know what it feels like to have images in your head that far outstrip what you can reasonably achieve with your school’s cameras and your friend’s free time. Pika prompts you to “describe your story” every time you look at an empty text field, but the service actively doesn’t respond well to narrative in my experience. Short, robotic prompts are far more likely to produce something interesting than overly descriptive narrative text. Like plenty of other AI tools, you have to “think” like they do to get anything workable.
This is an issue with all current generative AI video tools, but if you want continuity between clips on Pika, you’re out of luck. You can base a new clip on a previous clip you created fairly easily, but there’s really no guarantee it will look anything like the first, even with the adjustments Pika allows you to make. There are similar issues with getting Pika to respond to specific style suggestions. Slapping “anime” at the end of a prompt is not nearly enough to get the Suzume or The Boy and The Heron-inspired clip you’re probably imagining in your head. You have to be willing to put in the work and deal with a lot of trial and error.
Baby Steps
There’s a reason this is Pika 1.0 and the current experience the company is offering is free. With time and money, Pika will improve, just like other AI models and tools. But Pika is also a good reminder that text is far easier to produce than video. That should be obvious, but play around with ChatGPT and DALL-E long enough, and it’s easy to get cocky.
Pika is also a good reminder that text is far easier to produce than video.
Pika looks a lot like a stock footage site to me, and as it stands, that’s the only thing it even gets close to being able to replace, and that’s only because it seems like it might be easier to hide the flaws in landscape shots or quick inserts than action shots with faces in them. They still wouldn’t make the cut in a professional production house.
Pika is easier in some ways than getting out a camera yourself, sure, but it will never be spontaneous in the same way that working with real people is. That’s fine in a pinch, but does it awaken the creator in you? No. You’re better off doing it yourself.