Get all your news in one place.
100’s of premium titles.
One app.
Start reading

How guardrail misfires and Elon Musk turned AI safety into a partisan fight

Making AI safe, once a consensus goal for the industry, has become an ideological battleground.

Why it matters: Like "election integrity" in politics, everyone says they support "AI safety" — but now the term means something different depending on who's saying it.


Driving the news: The noisy departure of the head of OpenAI's "superalignment" team, charged with limiting any harm from advanced AI, reignited a long-running Silicon Valley debate on AI safety.

  • Critics say the industry's push to popularize AI is eclipsing its promises to develop the technology responsibly.
  • OpenAI CEO Sam Altman has long argued, and now most AI makers agree, that the best way to surface and defuse AI's many potential misuses is to put it into the general public's hands.

Zoom out: Safe AI has multiple meanings that cover a range of dangers.

  • No one wants AI going off on its own and plotting to wipe out humankind.
  • Few of us want AI spreading harmful information or misinformation — like accurate instructions for making bioweapons or inaccurate labels for toxic mushrooms.
  • Most of us don't want AI discriminating against people based on traits like their skin color or their gender.
  • Most of us would like AI to provide a fact-based record of historical and current events.

The phrase "AI safety" first came into use a decade ago with the rise of concern among researchers about AI's "existential risks" — their fear that an advanced AI would develop its own agenda hostile to humanity (like "maximize paper clip output"), become deceptive over time and end up destroying civilization.

  • That was something to maybe try to avoid — even if the doomsday scenarios were vague and far-fetched. So the original AI safety agenda aimed at avoiding any kind of paper clip apocalypse.

As AI began moving from the lab to our laptops, a different sort of risk emerged: Ethics specialists and social researchers sounded alarms about the prevalence of bias in AI algorithms.

The rise of ChatGPT and generative AI in 2022 brought a new kind of safety risk to the fore.

  • Suddenly, AI trained on essentially the entire internet was moving into our lives to answer our questions and invent pictures and stories.
  • The internet is full of both wonders and horrors. ChatGPT and its competitors reflected both.
  • If you wanted to stop your AI from telling lies about QAnon, Barack Obama's birthplace or COVID-19 vaccines' safety, you had to do something.

Enter "guardrails." To retrain the foundation models that drive the AI revolution so they're grounded in fact would take many months and dollars.

  • Silicon Valley firms racing to deploy and profit from genAI weren't willing to do that. So they added patchy fixes to reduce the volume of bias, lies and hate speech their products generated.
  • The unpredictable, "black box" nature of genAI meant that these guardrails would only be partially effective.

Case in point: You might want to make sure your image generator didn't only portray professionals with white skin.

  • But if you turned up the knobs on your guardrails too high, you might end up with an all-Black portrait of the U.S.'s founding fathers.

To the right, such overzealous guardrails became proof that the AI created by tech giants and leaders like OpenAI and Google had become "politically correct" or "woke" and could not be trusted.

  • Elon Musk has led the effort to rebrand AI safety to mean removing guardrails that limit AI speech to avoid antisemitism, racism and other offenses.
  • Musk and his allies see such efforts as symptoms of a "woke mind virus" that seeks to censor the truth.

AI "should not be taught to lie," Musk said last month in a talk at the Milken Institute. "It should not be taught to say things that are not true. Even if those things are politically incorrect."

  • Musk's AI project, xAI, is following in the tracks of his effort to reshape Twitter, now X, as a "free speech zone" that's more tolerant of fringe and extremist views and less concerned about avoiding offense or harm to users and society.
  • If you believe that censorship is a greater danger than hate speech, you can call such an approach a form of "safety." (Musk has not hesitated to limit the speech of users on X when they criticize him or his companies.)

Our thought bubble: The U.S. public is sharply divided on so many issues of fact today — from the inflation rate to the outcome of the 2020 election — that expecting AI to determine or report "the truth" seems hopelessly naive.

  • The best models are transparent processes grounded in carefully evaluated evidence and inviting the wide participation of different groups and perspectives — something more like Wikipedia than an algorithm.

The other side: Some experts view the safety debate as overwrought and unnecessary.

  • "It seems to me that before 'urgently figuring out how to control AI systems much smarter than us' we need to have the beginning of a hint of a design for a system smarter than a house cat," AI pioneer and Meta executive Yann LeCun posted recently on X.

What's next: The struggle over AI safety will play out around the globe, as governments in China, India and other nations adapt the technology to suit nationalist or authoritarian agendas — and clothe their agendas in the rhetoric of risk reduction.

Go deeper: There's no such thing as "values-free" AI

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.