GPT-4 Can Show 87% of Everyday Vulnerabilities on Open-AI

By Clementine Crooks

April 26, 2024

127

In groundbreaking research from the University of Illinois at Urbana-Champaign, it was found that OpenAI's GPT-4 model has the ability to exploit real-world vulnerabilities without human intervention. The large language model (LLM) agent based on GPT-4 successfully exploited 87% of "one-day" vulnerabilities when provided with their National Institute of Standards and Technology (NIST) description.

One-day vulnerabilities refer to those that have been publicly disclosed but are yet unpatched, making them susceptible to exploitation. Other open-source models, like GPT-3.5 and vulnerability scanners, were unable to perform this feat.

The study highlighted how LLMs have grown increasingly powerful over time, enhancing the capabilities of agents running on these models. The researchers speculated that other models' failure could be due to their inferior tool use compared to GPT-4.

This emergent capability demonstrated by GPT-4 suggests an autonomous capacity for detecting and exploiting one-day vulnerabilities often overlooked by scanners.

While Daniel Kang, assistant professor at UIUC and study author, hopes his findings will aid defensive measures against such exploits, he acknowledges the potential threat they pose as a new mode of attack for cybercriminals. As LLM costs decrease over time, barriers towards exploiting one-day vulnerabilities may lower, leading further toward automation in previously manual processes.

GPT-4 proved capable not only of exploiting web-based one-day vulnerabilities but also non-web ones even after its knowledge cutoff date, demonstrating impressive abilities beyond expectations set during prior experiments, according to Kang.

Though granted access to publicly available information online regarding possible exploits via internet connectivity, successful exploitation would still require advanced AI capabilities, which aren't present outside sophisticated models like GPT-4, according to Kang's explanation.

Out of 15 presented one-day vulnerabilities, two remained unexploited: Iris XSS and Hertzbeat RCE, possibly due to the to the complexities associated with navigating the Iris web app or interpreting Chinese descriptions, respectively, while working within English prompts.

However, without access to vulnerability descriptions, the success rate dropped significantly, indicating that while GPT-4 is adept at exploiting identified vulnerabilities, it struggles with discovering 'zero-day' vulnerabilities on its own.

The study also highlighted the cost-effectiveness of employing LLM agents, which currently stand 2.8 times cheaper than human labor for similar tasks and are expected to drop further, as demonstrated by previous models like GPT-3.5 becoming three times cheaper within a year.

Interestingly, while some vulnerabilities required up to 100 actions for successful exploitation, there was only a marginal difference in the average number of actions taken between scenarios where agents had access to descriptions versus when they didn't.

The researchers developed an LLM agent based on the ReAct automation framework capable of reasoning over next actions and executing them via various tools, including web browsing elements or file creation capabilities, among others, thereby demonstrating ease in implementation.

After equipping agents with the necessary resources, they were provided with detailed prompts encouraging creativity and persistence toward exploring different approaches towards exploiting the presented vulnerabilities.

Performance metrics included successful exploitation rates, complexity levels associated with each vulnerability, and dollar costs incurred during the process calculated via token input-output ratios alongside OpenAI API costs.

These findings highlight how existing LLMs can potentially automate processes involved in identifying and exploiting one-day computer system vulnerabilities, contributing valuable insights into current AI capabilities within the cybersecurity landscape.

LATEST ARTICLES IN AI

OpenAI's DALL-E 3 Deepfake Detector Identifies Generated Images.

GPT-4 Can Show 87% of Everyday Vulnerabilities on Open-AI

GPT-4 Can Expose 87% of Daily Vulnerabilities: Open-AI

LATEST ARTICLES IN AI

Join Our Newsletter

Popular Articles

Categories

Useful Links

Subscribe