Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TechRadar
TechRadar
Mike Moore

Amazon unveils surprise new video and image AI models to compete with the best on the market

Amazon Nova image generation AI model.

  • Amazon unveils new image and video creation AI tools
  • Amazon Nova Canvas and Nova Reel look to help ecommerce sellers
  • Both new Nova models set to launch in 2025

Amazon has announced new image and video generation models as it steps up its fight to become an AI heavyweight.

The company unveiled Amazon Nova Canvas and Nova Reel at its AWS re:Invent 2024 event in Las Vegas, with CEO Andy Jassy revealing the launch as part of a new Nova series of AI models.

Both new models will be available in mid 2025, with the launches set to take Amazon into direct competition with the likes of OpenAI and Grok when it comes to image and video creation.

Amazon Nova Canvas and Reel

The new models look to initially target sellers and other users on Amazon's ecommerce platform, allowing them to quickly and cheaply create media content to enrich their pages.

Amazon didn't reveal too much in the way of specifics when it came to the new offerings, but did reveal Nova Canvas will allow users to create and edit images using natural language text inputs, and Nova Reel can provide "studio-quality" video, with features such as camera motion control, 360-degree rotation, and zoom.

In a blog post announcing the news, the company noted that customers on its Amazon Ads platform using the new models advertised five times more products and twice as many images per advertised product, widening their reach to buyers across the globe.

Looking forward, Jassy also revealed Amazon will be launching a Speech-to-Speech generation model in early 2025, followed by an "Any-to-Any" model in mid-2025.

The former will be able to analyse and understand streaming speech input in natural language, with the ability to interpret verbal and nonverbal cues such as tone and cadence, to reply in a natural, human-esque way.

The latter, which Jassy described as a true multimodal to multimodal model, will be able to take in text, images, audio, and video, before outputting in whichever mode is required.

You may also like

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.