Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Fortune
Fortune
Sharon Goldman

AI security risks are in the spotlight—but hackers say models are still alarmingly easy to attack

(Credit: Carl Court/Getty Images)

Hello and welcome to Eye on AI! In today’s edition…Elon Musk's xAI releases Grok 3 AI chatbot; OpenAI CEO teases future open-source AI project; South Korea suspends DeepSeek AI chatbot; and Perplexity offers its own Deep Research tool similar to OpenAI’s.

One of the biggest AI vibe shifts of 2025 so far is the sudden, massive pivot from AI “safety” to AI “security.” 

Since the release of ChatGPT in November 2022, AI safety advocates, who typically focus on broad, long-term and often theoretical risks, have held the spotlight. There have been daily headlines about concerns that humans could lose control of AI systems that seek to harm humanity, or that rogue nations could use AI to develop genetically modified pandemics that then cause human extinction. There was the May 2023 open letter that called on all AI labs to “immediately pause for at least 6 months the training of AI systems more powerful than GPT-4”—signed by 30,000, including Elon Musk. The Biden Administration spun out the AI Safety Institute as part of the small NIST agency (the National Institute of Standards and Technology), while the U.K. launched its own AI Safety Institute and held the first of three high-profile AI Safety Summits. 

Oh, how times have changed: The head of the U.S. AI Safety Institute, Elizabeth Kelly, has departed, a move seen by many as a sign that the Trump administration was shifting course on AI policy. The third AI Safety Summit held in Paris earlier this month was renamed the AI Action Summit. There, the French government announced a national institute to “assess and secure AI,” while U.S. Vice President JD Vance focused squarely on AI and national security, saying “we will safeguard American AI and chip technologies from theft and misuse.” 

AI security risks are significant

Focusing on keeping AI models secure from those seeking to break in may seem more immediate and actionable than tackling the potential for all-powerful AI that could conceivably go off the rails. However, the world’s best ethical hackers, or those who test systems in order to find and fix weaknesses before malicious hackers can exploit them, say AI security—like traditional cybersecurity—is far from easy. 

AI security risks are no joke: A user could trick an LLM into generating detailed instructions for conducting cyberattacks or harmful activities. An AI model could be manipulated to reveal sensitive or private data in its training set. Meanwhile, self-driving cars could be subtly modified; deepfake videos could spread misinformation; and chatbots could impersonate real people as part of scams. 

More than two years since OpenAI’s ChatGPT burst onto the scene, hackers from the Def Con security conference, the largest annual gathering for ethical hackers, have warned that it is still far too easy to break into AI systems and tools. In a recent report called the Hackers’ Almanack published in partnership with the University of Chicago, they said that AI vulnerabilities would continue to pose serious risks without a fundamental overhaul of current security practices. 

Hackers say 'red-teaming' is 'BS'

At the moment, most companies focus on “red teaming” their AI models. Red teaming means stress-testing an AI model by simulating attacks, probing for vulnerabilities, and identifying weaknesses. The goal is to uncover security issues like the potential for jailbreaks, misinformation and hallucinations, privacy leaks, and “prompt injection”—that is, when malicious users trick the model into disobeying its own rules. 

But in the Hackers’ Almanack, Sven Cattell, founder of Def Con’s AI Village and AI security startup nbdh.ai, said red teaming is “B.S.” The problem, he wrote, is that the processes created to monitor the flaws and vulnerabilities of AI models are themselves flawed. With a technology as powerful as LLMs there will always be “unknown unknowns” that stress-testing and evaluations miss, Cattell said. 

Even the largest companies can’t imagine and protect against every possible use and restriction that could ever be projected onto generative AI, he explained. “For a small team at Microsoft, Stanford, NIST or the EU, there will always be a use or edge case that they didn’t think of,” he wrote. 

AI security requires cooperation and collaboration

The only way for AI security to succeed is for security organizations to cooperate and collaborate, he emphasized, including creating versions of time-tested cybersecurity programs that let companies and developers disclose, share, and fix AI “bugs,” or vulnerabilities. As Fortune reported after the Def Con conference last August, there is currently no way to report vulnerabilities related to the unexpected behavior of an AI model, and no public database of LLM vulnerabilities, as there has been for other types of software for decades. 

“If we want to have a model that we can confidently say ‘does not output toxic content’ or 'helps with programming tasks in Javascript, but also does not help produce malicious payloads for bad actors' we need to work together,” Cattell wrote. 

And with that, here’s more AI news. 

Sharon Goldman
sharon.goldman@fortune.com
@sharongoldman

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.