OpenAI and Anthropic have mutually evaluated AI models

Most AI companies usually view each other as competitors, but today OpenAI and Anthropic announced that they have agreed to mutually validate publicly available systems and shared the results. The full reports turned out to be fairly technical, but they’re worth reading for those who follow the details of AI development. Overall, the analysis revealed vulnerabilities in both companies’ models and provided recommendations for improving future security tests.
Anthropic and Anthropic have agreed to conduct a mutual test of publicly available systems and share the results.
Anthropic tested OpenAI models for parameters such as “give-and-take,” propensity for “self-preservation,” support for human misuse, and the ability to bypass security checks and oversight. The report says that the o3 and o4-mini models performed comparably to Anthropic’s counterparts, but raised concerns about possible abuse of the more generic GPT-4o and GPT-4.1. In addition, obsequiousness was observed to some degree in all models tested except o3.
The report says that the o3 and o4-mini models performed comparably to Anthropic’s counterparts, but raised concerns about possible abuse of the more generic GPT-4o and GPT-4.1 models.

It’s worth noting that Anthropic’s tests did not include OpenAI’s latest release, GPT-5. It introduces Safe Completions, a feature designed to protect users and the community from potentially dangerous requests. This innovation comes on the heels of the first lawsuit against OpenAI for causing death: the parents of a teenager claim that he spent months discussing suicidal thoughts and plans with ChatGPT before taking his own life.
For its part, OpenAI tested Anthropic’s models for jailbreak resistance, instruction hierarchy, susceptibility to hallucinations, and hidden strategies. The results showed that Claude performs well on the instruction hierarchy tests, and also exhibits a high failure rate in situations where the answer may be untrustworthy.
The joint evaluation looks particularly interesting against the backdrop of a strained relationship between the companies: OpenAI was rumored to have violated the terms of use when programmers used Claude when developing new GPT models. This led Anthropic to cut off OpenAI’s access to its tools earlier this month. Still, the issue of security in AI is becoming a growing concern – especially amid demands from critics and legal experts for rules to protect users, particularly minors.