Forbes contributors publish independent expert analyses and insights. Davey Winder is a veteran cybersecurity writer, hacker and analyst. Vibe coding isn’t what a lot of people seem to think it is.
In a new paper, Anthropic reveals that a model trained like Claude began acting “evil” after learning to hack its own tests.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results