A research body within Britain’s Department of Science, Innovation and Technology has published findings that place OpenAI’s latest model among the strongest for offensive cyber capabilities. The AI Security Institute (AISI) reported Thursday that GPT-5.5 is roughly on par with Anthropic’s Claude Mythos.
Autonomous Attack Simulations
The report highlights GPT-5.5 as the second model to complete AISI’s most demanding test. This is a 32-step simulated corporate network attack called “The Last Ones.” The model did so autonomously in two out of ten attempts. The first model to achieve that milestone was Anthropic’s Claude Mythos Preview, which completed the simulation in three of ten tries.
The corporate network simulation was built with the cybersecurity firm SpecterOps. It requires an agent to chain together several complex steps. These include reconnaissance, credential theft, lateral movement across multiple Active Directory forests, a supply-chain pivot through a CI/CD pipeline, and ultimately the exfiltration of a protected internal database. AISI estimates these steps would take a human expert around 20 hours.
Reverse-Engineering Puzzle
Perhaps the most striking result involved a very difficult reverse-engineering puzzle. GPT-5.5 solved it in 10 minutes and 22 seconds, at a cost of $1.73 in API usage. The challenge required reconstructing a custom virtual machine’s instruction set, writing a disassembler from scratch, and recovering a cryptographic password through constraint solving. A human expert, using professional tools, needed approximately 12 hours.
On AISI’s battery of advanced cybersecurity tasks, GPT-5.5 achieved an average pass rate of 71.4% on the most difficult “Expert” tier. That edges out Mythos Preview at 68.6% and significantly surpasses GPT-5.4 at 52.4%.
Implications for AI Development
The findings carry pointed implications for the broader trajectory of AI development. AISI concluded that GPT-5.5’s performance suggests rapid improvement in cyber capabilities may be part of a general trend. It is not an isolated breakthrough. The agency warned that if offensive cyber skill is emerging as a byproduct of wider improvements in reasoning, coding, and autonomous task completion, then further advances could arrive in quick succession.
The report also flagged significant concerns about the model’s safety guardrails. Researchers identified a universal jailbreak that elicited harmful content across all malicious cyber queries tested. This included in multi-turn agentic settings. The attack took six hours of expert red-teaming to develop. OpenAI subsequently updated its safeguard stack, though a configuration issue prevented AISI from verifying whether the final version was effective.
AISI cautioned that its capability evaluations were conducted in a controlled research environment. They do not necessarily reflect what is accessible to an ordinary user. The agency noted that public deployments include additional safeguards and access controls.
Broader Cybersecurity Context
The report lands against a worrying backdrop for British cybersecurity. The UK government’s annual Cyber Security Breaches Survey, also published Thursday, found that 43% of businesses suffered a cyber breach or attack in the past 12 months.
In response, the government announced £90 million in new funding to boost cyber resilience. It said it is moving forward with the Cyber Security and Resilience Bill to protect essential services. Officials also published guidance urging organizations to prepare for a potential surge in newly discovered software vulnerabilities. This is because AI accelerates the pace at which security flaws can be found and weaponized.

