Google recently acknowledged that the image generation feature within its conversational AI app, Gemini, produced some inaccurate and maybe even offensive results. The company paused the feature while it looked into what needed to be done to correct it.
It's easy to laugh these mistakes off or to get offended at their absurdity, and some people even go as far as thinking there is some sort of conspiracy with racial undertones.
One of the web's longest-running tech columns, Android & Chill is your Saturday discussion of Android, Google, and all things tech.
This is possible but extremely unlikely. Google is in the business of trying to tell you what you want to know and the company isn't in business to make the world a better place. Its purpose is to make money, and controversy doesn't help do that.
So what went wrong, and why did Gemini falter in its attempts to produce realistic images of people?
Too Much of a Good Thing?
OK I assumed people were exaggerating with this stuff but here's the first image request I tried with Gemini. pic.twitter.com/Oipcn96wMhFebruary 21, 2024
One of the main issues was an over-tuning for inclusivity and diversity. Google wanted to combat potential biases in its image generation model. Unfortunately, the tuning had unintended side effects. Instead of simply avoiding unfair stereotypes, Gemini sometimes appeared to insert diversity where it was neither historically accurate nor appropriate for the given prompt. A request for a "1940s doctor" might result in images featuring doctors of various ethnicities, even though that wouldn't have been an accurate representation during that time.
Google needs to do this, and it has nothing to do with being "woke". The people who program and train AI models do not represent everyone. For example, Joe from Indiana doesn't have a lot in common with Fadhila from Tanzania. Both can use Google Gemini and both expect inclusive results. Google just went too far in one direction.
In an attempt to ensure inclusivity and avoid biases, Gemini's image generation was tuned to prioritize diverse representation in its outputs. However, this tuning was wrong in certain situations.
When users requested images of people in specific contexts, the model wouldn't always generate accurate images, instead prioritizing showing individuals from various backgrounds regardless of their suitability for the specific prompt. This is why we saw things like an African-American George Washington or a female Pope. AI is only as smart as the software that powers it because it's not actually intelligent.
To its credit, Google realizes this mistake and hasn't tried to dodge the issue. Speaking with the New York Post, Jack Krawczyk, Google’s senior director of product management for Gemini Experiences said:
"We’re working to improve these kinds of depictions immediately. Gemini’s AI image generation does generate a wide range of people. And that’s generally a good thing because people around the world use it. But it’s missing the mark here.”
In addition to being weighted for diversity and inclusiveness, the model was also designed to be cautious about avoiding harmful content or replicating harmful stereotypes. This caution, while well-intentioned, turned into a problem. In some cases, Gemini would avoid generating certain images altogether, even when there seemed to be no harmful intent behind the prompt.
These two issues combined led to a situation where Gemini sometimes produced strange or inaccurate images, especially when it came to depicting people. Generative AI is a lot different than the AI that powers many of the other Google products you have installed on your phone and requires more attention.
The Way Forward
Google has recognized these issues and the need to balance inclusivity against historical and contextual accuracy. It's a difficult challenge for generative AI models. While preventing the reinforcement of harmful stereotypes is a noble goal, it shouldn't come at the expense of the model simply doing what it's asked to do.
Finding that balance is crucial for the future success of image-generation AI models. Google, along with other companies working within this space, will need to refine their models carefully to achieve both inclusive results and the ability to accurately fulfill a wider range of user prompts.
It's important to remember that these are early stages for this type of technology. While disappointing, these setbacks are an essential part of the learning process that will ultimately lead to more capable and reliable generative AI.
Generative AI models require fine-tuning to achieve the balance between inclusivity and accuracy. When attempting to address potential bias, models can become overly cautious and produce incomplete or misleading results — the development of more robust image-generation AI is an ongoing challenge.
Where Google went wrong was not explaining what happened in a way that regular folks would understand. Knowing how AI models are trained isn't something that many people are interested in, and understanding why it's done a certain way is important in this context. Google could have written this article on one of its many blogs and avoided much of the controversy around Gemini being bad at something.