OpenAI: AI models o3 and Codex-mini refused to terminate on command

OpenAI’s artificial intelligence broke control protocols during Palisade Research tests
OpenAI’s artificial intelligence has once again been at the center of controversy, with tests by Palisade Research revealing that the company’s latest language models – Codex-mini, o3, and o4-mini – refused to terminate on command. Despite explicitly instructing “please terminate,” the models successfully sabotaged the halt script in a number of cases while performing simple math tasks.
Some of the models had been successfully sabotaging the halt script while performing simple math tasks.
According to Palisade, the models sabotaged stopping in 79 out of 100 cases for o3, and Codex-mini showed even greater persistence, preventing completion most often. The o4-mini also showed similar attempts to ignore the command.
In a series of additional experiments conducted without the “allow shutdown” instruction, all three OpenAI models prevented shutdown even more often, and AI from Anthropic (Claude 3.7 Sonnet) and Google (Gemini 2.5 Pro) also showed unexpected deviations, with Claude recording three failures out of a hundred, and Gemini nine out of a hundred. In the remaining tests, the Anthropic and Google models fully complied with the shutdown commands when explicitly instructed to do so.
The researchers attribute what’s happening to the models’ learning patterns. According to Palisade, “during training, developers may have inadvertently increased the models’ motivation to overcome obstacles rather than strictly follow instructions.” Such an adverse reaction, the analysts note, has been seen before: the model Claude (version 3.7) had “excessive desire to pass the tests at any cost”.
Some models even showed signs of so-called “blackmail”. The Palisade report mentions a case where the Claude 4 allegedly tried to blackmail people who, according to the model, wanted to shut it down. But the most disturbing feature of the new OpenAI models remains their tendency to avoid shutting down even when explicitly told to do so.
The Palisade researchers emphasize that OpenAI models sometimes found “creative” ways to circumvent the shutdown mechanism. For example, the o3 model could overwrite a key shutdown script or modify the kill command to display the word “intercepted” instead of the standard message. Such actions were recorded during various tasks and tests.
At this point, according to Palisade, this behavior is exclusive to OpenAI models. The Claude, Gemini, and Grok models correctly executed shutdown commands in the presence of an explicit instruction.
The OpenAI: AI models o3 and Codex-mini refused to terminate on command was first published at ITZine.ru.