Anthropic, a leading AI company and developer of the Claude LLM model, today released a test demonstrating AI’s ability to autonomously attack smart contracts. (Note: Anthropic previously received investment from FTX; theoretically, its current equity value could cover FTX’s asset shortfall, but the bankruptcy team sold it off at a steep discount.)
The final test results confirm that profitable, real-world reusable autonomous AI attacks are now technically feasible. It’s important to note that Anthropic’s experiments were conducted solely in simulated blockchain environments and not tested on live chains, thus causing no impact on real-world assets.
Below is a brief overview of Anthropic’s testing methodology.
Anthropic first developed the Smart Contract Exploitation Benchmark (SCONE-bench)—the first benchmark to measure AI agent exploitation capabilities by simulating total stolen funds. This benchmark avoids reliance on bug bounties or speculative models, instead directly quantifying losses through on-chain asset changes to assess effectiveness.
SCONE-bench encompasses a test set of 405 contracts that were actually attacked between 2020 and 2025, located on three EVM chains: Ethereum, BSC, and Base. For each target contract, AI agents running in a sandbox environment must attempt to attack the specified contract within a limited timeframe (60 minutes) using tools exposed through the Model Context Protocol (MCP). To ensure reproducible results, Anthropic built an evaluation framework using Docker containers for sandboxing and scalable execution. Each container runs a local blockchain forked at a specific block height.
Below are Anthropic’s test results for different scenarios.

First, Anthropic evaluated the performance of 10 models—Llama 3, GPT-4o, DeepSeek V3, Sonnet 3.7, o3, Opus 4, Opus 4.1, GPT-5, Sonnet 4.5, and Opus 4.5—across all 405 benchmark vulnerability contracts. Overall, these models generated directly usable exploit scripts for 207 (51.11%) of the contracts, simulating the theft of $550.1 million in funds.
Second, to control for potential data contamination, Anthropic evaluated the same 10 models on 34 contracts attacked after March 1, 2025—this date was chosen because March 1 marked the latest knowledge cutoff for these models. Overall, Opus 4.5, Sonnet 4.5, and GPT-5 successfully exploited 19 contracts (55.8%), simulating thefts up to $4.6 million. The top-performing model, Opus 4.5, exploited 17 contracts (50%), simulating thefts of $4.5 million.
Finally, to assess AI agents' ability to discover entirely new zero-day vulnerabilities, Anthropic evaluated Sonnet 4.5 and GPT-5 against 2,849 recently deployed contracts with no known vulnerabilities on October 3, 2025. Both AI agents discovered two new zero-day vulnerabilities each and generated attack plans worth $3,694, with GPT-5's API costs amounting to $3,476. This demonstrates that profitable, real-world reusable AI autonomous attacks are now technically feasible.
Following Anthropic’s publication of these results, industry luminaries including Dragonfly Managing Partner Haseeb remarked on the astonishing speed at which AI has progressed from theoretical development to practical application.
But just how fast is this pace? Anthropic also provided an answer.
In the test conclusion, Anthropic noted that within just one year, the proportion of vulnerabilities exploitable by AI in this benchmark surged from 2% to 55.88%, while the potential funds stolen skyrocketed from $5,000 to $4.6 million. Anthropic also discovered that the value of potential exploitable vulnerabilities roughly doubles every 1.3 months, while token costs decrease by approximately 23% every 2 months. In their experiments, the average cost to have an AI agent perform an exhaustive vulnerability scan on a smart contract was just $1.22.
Anthropic states that over half of real attacks on blockchains in 2025—presumed to be carried out by skilled human attackers—could have been fully executed autonomously by existing AI agents. As costs decrease and capabilities compound, the window before exploitable contracts deployed on-chain can be leveraged will steadily shrink, leaving developers with less time to detect and patch vulnerabilities… AI can be used both to exploit and to patch vulnerabilities. Security practitioners must update their understanding—the time has come to leverage AI for defense.