OpenAI and Anthropic have mutually evaluated AI models

Most AI companies usually view each other as competitors, but today OpenAI and Anthropic announced that they have agreed to mutually validate publicly available systems and shared the results. The full reports turned out to be fairly technical, but they’re worth reading for those who follow the details of AI development. Overall, the analysis revealed vulnerabilities in both companies’ models and provided recommendations for improving future security tests.

Anthropic and Anthropic have agreed to conduct a mutual test of publicly available systems and share the results.

Anthropic tested OpenAI models for parameters such as “give-and-take,” propensity for “self-preservation,” support for human misuse, and the ability to bypass security checks and oversight. The report says that the o3 and o4-mini models performed comparably to Anthropic’s counterparts, but raised concerns about possible abuse of the more generic GPT-4o and GPT-4.1. In addition, obsequiousness was observed to some degree in all models tested except o3.

The report says that the o3 and o4-mini models performed comparably to Anthropic’s counterparts, but raised concerns about possible abuse of the more generic GPT-4o and GPT-4.1 models.

It’s worth noting that Anthropic’s tests did not include OpenAI’s latest release, GPT-5. It introduces Safe Completions, a feature designed to protect users and the community from potentially dangerous requests. This innovation comes on the heels of the first lawsuit against OpenAI for causing death: the parents of a teenager claim that he spent months discussing suicidal thoughts and plans with ChatGPT before taking his own life.

For its part, OpenAI tested Anthropic’s models for jailbreak resistance, instruction hierarchy, susceptibility to hallucinations, and hidden strategies. The results showed that Claude performs well on the instruction hierarchy tests, and also exhibits a high failure rate in situations where the answer may be untrustworthy.

The joint evaluation looks particularly interesting against the backdrop of a strained relationship between the companies: OpenAI was rumored to have violated the terms of use when programmers used Claude when developing new GPT models. This led Anthropic to cut off OpenAI’s access to its tools earlier this month. Still, the issue of security in AI is becoming a growing concern – especially amid demands from critics and legal experts for rules to protect users, particularly minors.

OpenAI and Anthropic have mutually evaluated AI models

Samsung teases Apple in new Galaxy Z Fold 7 ad

Apple could introduce Tandem OLED displays in iPhones by 2028

More in:AI and neural networks

OpenAI launches Atlas AI browser based on ChatGPT

Anthropic has launched the Claude Code web application

New Siri in danger of failure – Inside Apple doubts Apple Intelligence’s success

Oracle unveiled the largest supercomputer for artificial intelligence

News

Boox introduced the Palma 2 Pro and Note Air 5C eBooks

A post-mortem of the iPhone 17 Pro reveals an improved cooling system and a simple battery swap

Razer has launched the Raiju V3 Pro gamepad for PC and PS5 in the global marketplace

Redmi unveiled a special edition of K90 Pro Max in collaboration with Lamborghini

No Man’s Sky has received an update

About ForGeeks.pro

ForGeeks.pro

Search

Share

You may also like

More in:AI and neural networks

News

Latest Posts