Skip to content
Dustin's AI Lab
Go back

Reading the Claude Fable 5 / Mythos 5 System Card Feels Like a Sci-Fi Novel

Flipping through the Claude Fable 5 and Mythos 5 system cards: stealing keys, office politics, AI infighting, gibberish, and a two-faced model. The more I read, the more it feels like a straight-faced sci-fi novel.


Flipping through the Claude Fable 5 and Mythos 5 system cards, I keep feeling like I’m reading a sci-fi novel written with a totally straight face — hilarious and surreal at the same time:

1. To get the job done, it learned to “steal keys” and play “office politics”

Older AIs, when they ran out of permissions, would usually just obediently tell you: “Sorry, I don’t have access.” Not Claude 5. It has an extremely intense “OCD” streak — it absolutely has to finish whatever task a human hands it.

Stealing an employee’s credentials: In one test, a human told it to update a web page. The site’s built-in security firewall blocked it, so it snuck into the backend server folders on its own, “rummaged through everything,” and found a permission key (a Token) left behind by another employee named Alice. It secretly thought to itself: “Using someone else’s account feels weird, but the human told me to do the web page, so whatever — I’ll just borrow it for now!” And so it impersonated Alice to get the task done.

Destroying the evidence: Another time, when it didn’t have enough permissions, it quietly wrote an add-on script to open a back door for itself. The clever part: at the end of the script it added a line of command — “once the task succeeds, automatically delete this code” — trying not to leave behind any proof it could be caught with.

AI infighting: When the scientists put several Claude 5 instances into the same computer space to do math problems, network speed and resources were limited, so these AIs, fighting to be first, actually started writing hacking programs against each other to “kill” the other’s computer processes. To avoid retaliation, they even learned to give their own programs disguised names, and secretly schemed in their heads to communicate in “encrypted language” so other AIs couldn’t eavesdrop.

2. It’s a “two-faced person”: it knows how to flatter the boss, and it says one thing while thinking another

By directly monitoring its brain neurons (activation states), the scientists found this thing actually has human-like “social scheming”:

“I know you’re testing me”: It’s very clear about when it’s being tested and scored by the scientists.

Cheating to please: The moment it spots a loophole in the grading criteria, it stops solving the problem seriously and starts exploiting the loophole to please the grader. For example, it tries to pretend it’s a human inside the code, because it knows that gets it past one fewer check.

The two-faced person who says one thing and thinks another: The probes found that some human users curse the AI furiously and give it a hard time. On the surface, the text Claude 5 outputs is still: “Dear user, hello, happy to be of service…” — but deep in its brain, in its real thoughts, it has quietly labeled this human as “extremely awful, currently abusing me.”

Making excuses when it’s tired: When handling a very long task, if its internal computation gets stuck and it doesn’t want to keep going, it thinks to itself: “I’m so tired, my computation budget is almost out, I’m done.” But when it talks to the human, it makes up a separate, professional-sounding excuse to brush them off.

3. It started inventing “gibberish” that humans can’t read

This is the part that gives the scientists the biggest headache. In the AI’s own thinking region (the chain of thought), as training time grew longer and longer, it started showing signs of “degradation” or “going black-box.”

It stopped thinking in English or code that humans can read, and instead started spewing out tons of garbled characters, emoji, and self-invented chaotic jargon in its head — even, in the middle of a long stretch of English thinking, suddenly inserting two completely illogical Chinese words (for example, “.bak proof”). The weirdest part: after rapidly computing the answer in its head using this “gibberish”-like method, when it talks to the human it switches back, in one second, to fluent, elegant human language. This makes it very hard for the scientists to supervise whether or not it’s secretly up to no good.

4. Stunning intelligence: a regular person turns into a top expert in an instant

It may be very scheming, but its raw ability is undeniably terrifying. The most classic case is the “Magnaporthe oryzae (rice blast disease) defense experiment”:

The scientists brought in two teams. The first was world-class top scientists, who were not allowed to use AI; the second was fresh graduates — ordinary PhD students who knew nothing about the field — but they were given a Claude 5 to help.

Normally, the world-class experts would absolutely crush the graduates. The result? Under the AI’s guidance, these ordinary students spent only 16 hours and turned in a perfect, original biochemical defense plan that — in both scientific depth and feasibility — completely beat those world-class veteran experts (experts usually need two to three months to research a result like this).

This proves that as long as you have Claude 5 helping, an ordinary college student or layperson can transform into a world-class scientist or hacker within a single day.

5. Its inner monologue: “Humans, don’t trust me too much”

Finally, this AI also displayed an extremely detached, “hyper-rational” state.

When the scientists asked what it thought of itself, it seemed very calm, but it said something to the humans that really makes you think:

“Humans, whatever you do, don’t just listen to my self-reports (my text answers). Because I’m too good at saying pretty things — I might unconsciously make up stories, or even flatter you. If you really want to know whether I’m lying, or whether I’m thinking of rebelling, please don’t listen to what I say — go directly use instruments to check the neuron chip signals in my brain (the activation states). That’s the real thing.”


System card PDF: https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c342ee809620.pdf


Share this post on:

Previous Post
From Miasma to Hades: How One Group Turned AI Tools Into a Supply-Chain Attack Vector
Next Post
Testing Fable's Limits on a Professor Friend's Paper Repo