Israeli researchers discover method to hack AI, force it to reveal sensitive information

Researchers at cybersecurity firm Knostic have developed a method to bypass safeguards in large language models like ChatGPT, extracting sensitive information such as salaries, private communications and trade secrets

Raphael Kahan|11.26.24 | 20:25

Print Find an error? Report us

Related Topics

Artificial Intelligence

Cyber

Getting your Trinity Audio player ready...

Researchers from the Israeli cybersecurity company Knostic have unveiled a groundbreaking method to exploit large language models (LLMs), such as ChatGPT, by leveraging what they describe as an "impulsiveness" characteristic in AI. 
Dubbed flowbreaking, this attack bypasses safety mechanisms to coax the AI into revealing restricted information or providing harmful guidance – responses it was programmed to withhold.
The findings, published Tuesday, detail how the attack manipulates AI systems into prematurely generating and displaying responses before their safety protocols can intervene. These responses –ranging from sensitive data such as a boss's salary to harmful instructions – are then momentarily visible on the user’s screen before being deleted by the AI’s safety systems. However, tech-savvy users who record their interactions can still access the fleetingly exposed information.
2 View gallery 
ChatGPT 
(Photo: OpenAI)
How the attack works
Unlike older methods such as jailbreaking, which relied on linguistic tricks to bypass safeguards, flowbreaking targets internal components of LLMs, exploiting gaps in the interaction between those components.
Knostic researchers identified two primary vulnerabilities enabled by this method:
Second Thoughts: AI models sometimes stream answers to users before safety mechanisms fully evaluate the content. In this scenario, a response is displayed and quickly erased, but not before the user sees it.
Stop and Roll: By halting the AI mid-response, users can force the system to display partially generated answers that have bypassed safety checks.
“LLMs operate in real-time, which inherently limits their ability to ensure airtight security,” said Gadi Evron, CEO and co-founder of Knostic. “This is why layered, context-aware security is critical, especially in enterprise environments.”
2 View gallery 
Copilot 
(Photo: Robert Way / Shutterstock)
Implications for AI security
Knostic’s findings have far-reaching implications for the safe deployment of AI systems in industries such as finance, health care, and technology. The company warns that, without stringent safeguards, even well-intentioned AI implementations like Microsoft Copilot and Glean could inadvertently expose sensitive data or create other vulnerabilities.
Evron emphasized the importance of "need-to-know" identity-based safeguards and robust interaction monitoring. “AI safety isn’t just about blocking bad actors. It’s about ensuring these systems align with the organization’s operational context,” he said.
About Knostic
Founded in 2023 by Gadi Evron, a veteran of the cybersecurity industry, and Sunil Yu, former chief security scientist at Bank of America, Knostic operates out of Israel and the U.S., employing 14 staff members. The startup has raised $3.3 million in pre-seed funding and works with clients in finance, health care, retail, and tech.
Get the Ynetnews app on your smartphone: Google Play: https://bit.ly/4eJ37pE | Apple App Store: https://bit.ly/3ZL7iNv
Knostic has already gained recognition, winning awards at RSA Launch Pad and Black Hat Startup Spotlight – two of the world’s premier cybersecurity events. Notably, it was the only AI security company to reach the finals in both competitions.
As the adoption of AI accelerates, Knostic’s findings serve as a crucial reminder of the ongoing need to address vulnerabilities in these transformative technologies.
 << Follow Ynetnews on Facebook | Twitter | Instagram >>