Google has launched its latest AI tool, Whisk, which promises to make image creation simpler by allowing users to prompt with images instead of lengthy text, making the process as easy as dragging and dropping.
Unlike traditional image editing tools that require detailed text prompts, Whisk allows users to upload images and generate AI-created visuals with ease. Whether it's a subject, scene, or style, Whisk combines these elements into one cohesive image, all without needing a single line of text.
A Fun Tool
Whisk is designed as a "creative tool" rather than a professional image editor.
According to Google, the goal is to offer quick inspiration for users, allowing them to experiment using different combinations. The tool is intended to provide fun, unique results rather than producing refined, pixel-perfect imagery.
How It Works
Whisk uses Google's Gemini model to automatically generate detailed captions for each uploaded image. These descriptions are then processed by Imagen 3, Google's latest image generation model.
Rather than creating an exact replica of the input images, the tool captures the essence of the subject, allowing users to remix various elements in unique ways.
Users can upload multiple images: one representing the subject, another depicting the scene, and a third showcasing the desired style.
Whisk then synthesizes these inputs to create something new, giving users the ability to visualize different combinations quickly, CNN reports.
"Whisk is designed to allow users to remix a subject, scene and style in new and creative ways, offering rapid visual exploration instead of pixel-perfect edits," Thomas Iljic, a director of product management at Google Labs, said in a statement.
Room for Experiment, Not Precision
While Whisk offers an innovative way to explore visual ideas, it's important to note that the results may not always meet expectations in terms of accuracy. Since the AI extracts only a few key characteristics from each input, the generated image might differ in height, weight, skin tone, or other features.
Google encourages users to embrace this as part of the creative process. For those who want more control, Whisk allows users to edit the underlying prompts and adjust the results as needed.
Available Now in the US
The new tool is currently available in the United States, and users can try it out by visiting labs.google/whisk.
Whisk is powered by the generative AI developed by DeepMind, the AI lab that Google acquired in 2014.
Veo 2: The New Video Generation Model
In addition to Whisk, Google has introduced Veo 2, the next version of its video generation model. This updated model is designed to better understand "the unique language of cinematography," creating more realistic video content. Veo 2 also reduces errors like generating extra fingers to a person's hand, a common issue with earlier AI models, according to The Verge.
Available Soon on Google VideoFX
Veo 2 will first be available through Google's VideoFX, and users can join the waitlist on Google Labs to access it. The model is set to expand to YouTube Shorts and other products in the coming year.