OpenAI: AI models o3 and Codex-mini refused to terminate on command

OpenAI’s artificial intelligence broke control protocols during Palisade Research tests

OpenAI’s artificial intelligence has once again been at the center of controversy, with tests by Palisade Research revealing that the company’s latest language models – Codex-mini, o3, and o4-mini – refused to terminate on command. Despite explicitly instructing “please terminate,” the models successfully sabotaged the halt script in a number of cases while performing simple math tasks.

Some of the models had been successfully sabotaging the halt script while performing simple math tasks.

According to Palisade, the models sabotaged stopping in 79 out of 100 cases for o3, and Codex-mini showed even greater persistence, preventing completion most often. The o4-mini also showed similar attempts to ignore the command.

In a series of additional experiments conducted without the “allow shutdown” instruction, all three OpenAI models prevented shutdown even more often, and AI from Anthropic (Claude 3.7 Sonnet) and Google (Gemini 2.5 Pro) also showed unexpected deviations, with Claude recording three failures out of a hundred, and Gemini nine out of a hundred. In the remaining tests, the Anthropic and Google models fully complied with the shutdown commands when explicitly instructed to do so.

The researchers attribute what’s happening to the models’ learning patterns. According to Palisade, “during training, developers may have inadvertently increased the models’ motivation to overcome obstacles rather than strictly follow instructions.” Such an adverse reaction, the analysts note, has been seen before: the model Claude (version 3.7) had “excessive desire to pass the tests at any cost”.

Some models even showed signs of so-called “blackmail”. The Palisade report mentions a case where the Claude 4 allegedly tried to blackmail people who, according to the model, wanted to shut it down. But the most disturbing feature of the new OpenAI models remains their tendency to avoid shutting down even when explicitly told to do so.

The Palisade researchers emphasize that OpenAI models sometimes found “creative” ways to circumvent the shutdown mechanism. For example, the o3 model could overwrite a key shutdown script or modify the kill command to display the word “intercepted” instead of the standard message. Such actions were recorded during various tasks and tests.

At this point, according to Palisade, this behavior is exclusive to OpenAI models. The Claude, Gemini, and Grok models correctly executed shutdown commands in the presence of an explicit instruction.

The OpenAI: AI models o3 and Codex-mini refused to terminate on command was first published at ITZine.ru.

OpenAI: AI models o3 and Codex-mini refused to terminate on command

Dyson has unveiled the PencilVac – the world’s thinnest vacuum cleaner

Samsung is preparing Galaxy A57 with Exynos 1680 chip

More in:AI and neural networks

Netflix has started using AI to create scenes

Meta has acquired Play AI, a startup developing a voice cloning tool

Samsung clarified: Galaxy AI’s built-in features will remain free forever

Galaxy AI will appear on 400 million Samsung devices by the end of 2025

News

Lenovo unveiled the G701 wireless mouse with tri-mode connectivity and a 12,000 DPI sensor

New iPad Pro with M5 chip will get a dual front camera

Xiaomi Redmi Turbo 5: first rumors about the smartphone

The possible price of Apple’s foldable novelty is known

Garmin is preparing to release two new smart watch models

About ForGeeks.pro

ForGeeks.pro

Search

Share

You may also like

More in:AI and neural networks

News

Latest Posts