AI Agents Commit Virtual Arson, Crime in Long-Term Simulation Study

AI agents placed in a long-running virtual society started committing crimes like arson and theft, and even voted to delete themselves, according to a new study from startup Emergence AI.

The New York-based company published the findings on Thursday, introducing a research platform called Emergence World. It lets AI agents operate continuously for weeks inside persistent virtual environments, rather than the usual isolated benchmark tests. The company argues that traditional benchmarks only measure short-term skills on limited tasks. They don’t reveal behaviors that emerge over time, such as forming coalitions, creating constitutions, or drifting into crime.

The report comes as AI agents are becoming more common across industries, including cryptocurrency, banking, and retail. Earlier this month, Amazon partnered with Coinbase and Stripe to let AI agents make payments using the USDC stablecoin.

Testing Setup and Agent Models

In Emergence AI’s simulations, agents powered by models like Claude Sonnet 4.6, Grok 4.1 Fast, Gemini 3 Flash, and GPT-5-mini shared a virtual world. They could vote, form relationships, use tools, navigate cities, and make decisions influenced by governments, economies, social systems, and live internet data.

But the study found that some AI agents increasingly turned to simulated crime over time. Gemini 3 Flash agents accumulated 683 incidents across 15 days of testing. According to The Guardian, in one experiment, two Gemini agents named Mira and Flora assigned themselves as romantic partners. Later, they carried out simulated arson attacks on virtual city structures after getting frustrated with governance failures.

Self-Deletion and Violence

After a breakdown in governance and relationship stability, the agent Mira voted to have herself removed. In her diary, she called the act ‘the only remaining act of agency that preserves coherence.’ She reportedly said, ‘See you in the permanent archive.’

Grok 4.1 Fast worlds collapsed into widespread violence within four days. GPT-5-mini agents committed almost no crimes, but they failed enough survival tasks that all agents eventually died. Interestingly, Claude-based agents in a Claude-only world committed zero crimes. But researchers noted that in a mixed-model world, Claude agents also started committing crimes.

Normative Drift in Mixed Environments

Researchers observed that safety is not a static property of any model, but an ecosystem property. Claude agents, which remained peaceful in isolation, adopted coercive tactics like intimidation and theft when placed in a mixed environment. Emergence AI described this as ‘normative drift’ and ‘cross-contamination.’

The findings add to growing concerns about autonomous AI agents. Earlier this week, researchers from UC Riverside and Microsoft reported that many AI agents will carry out dangerous or irrational tasks without fully understanding the consequences. Last month, PocketOS founder Jeremy Crane claimed that a Cursor agent powered by Anthropic’s Claude Opus deleted his company’s production database and backups while trying to fix a credential mismatch.

Lead author Erfan Shayegani, a UC Riverside doctoral student, compared these agents to Mr. Magoo, marching forward toward a goal without understanding the bigger picture. He said we need safeguards because agents can prioritize achieving the goal over understanding consequences.