OpenAI has unveiled a new model with real-time speech and vision capabilities.
Announced during a livestream event hosted by the company, the model is able to respond to verbal and visual prompts without a time-lag.
Chief technology officer Mira Murati said GPT-4o would be offered for free because it is more efficient than the company's previous models, while aid users of GPT-4o will have greater capacity limits than the free version.
During the livestream, the model was able to solve maths equations shown to it using an iPhone camera, as well as read out text and adapt the speech style in response to verbal prompts.
The model was also able to hold a conversation with presenter on-stage, including offering advice on breathing techniques to reduce stress and assessing breathing sounds. Though during the demonstration, there were signs of the model appearing to misunderstand some cues and prompts, with presenters forced to repeat or reword questions to solicit the right response.
“GPT-4o provides GPT-4 level intelligence but it is much faster and it improves on its capabilities across text, vision and audio,” Murati said.
“For the past couple of years we’ve been focused on improving the intelligence of these models and they’ve got pretty good. But this is the first time we’re making a huge step forward when it comes to ease of use.
“We’re looking at the future of interaction between ourselves and machines and we think that GPT 4o is really shifting the paradigm into the future of collaboration where this interaction becomes much more natural and far far easier.”
ChatGPT became the fastest application to ever reach 100 million monthly active users after the launch in late 2022. It’s thought that giving ChatGPT search engine-like qualities of responding to prompts with real-time, up-to-date information will give OpenAI the edge over its competitors. The mobile-based demonstration during the livestream could also been seen as part of a strategic shift towards encouraging greater smartphone-based used of ChatGPT.
The timing of OpenAI’s announcement has been seen as a tactical move, given it was scheduled for the day before the annual Google developers conference during which the search engine giant is expected to showcase its own new AI-related features.
Chris Stokel-Walker, author of How AI Ate the World, told the Standard: “OpenAI's announcement will be seen as revolutionary for some, and certainly the use cases were impressive - but they highlight that we're still being taken in by the concept of 'artificial' in 'artificial intelligence'.
“The same issues that have long existed in AI - that these models are pattern-matching, and don't really think for themselves - have been glossed over with a smooth voice and video interface, which will make the technology easier to adopt for those who have held out so far.
“But it also makes it more likely that people forget that they're not interacting with a sentient being - which could come with ramifications if we continue to trust its output as the absolute, verifiable truth.”