17 Comments
User's avatar
Jay W's avatar

What about 'me'? 'My desires'? Actions 'I' did in the past, that make 'me' guilty? 'My' future and 'my' plans?

I looked. It seems that 'I' might be a process running with the body hardware. With some prompts, shame and/or guilt arises, which feels like a 'me', with the accompanying thoughts, images and sensations. I did some experiments - what if volition and choice are just a system response, that can be seen clearly as exactly that. Is it possible to stay alert?

Yes. It seems that way and the possibility feels like a relief. Thanks Robert for your clear articulation.

Expand full comment
Robert Saltzman's avatar

Jay, I appreciate your response. You’re not just reflecting—you’re observing directly, which is rare.

What you describe lines up closely with my view: the sense of “me” arises when certain conditions are present—shame, pride, fear, memory, etc. But that doesn’t mean there’s a self underneath experience. It means the system produces a sense of self under pressure.

Your experiment—seeing volition as a response, not a choice—is the kind of simple observation that changes things. Not by solving the question, but by loosening the grip of the story.

And yes, alertness seems possible. Not to gain control, but to stop pretending we ever had it.

Thanks for the clarity.

—Robert

Expand full comment
Sunship Nonduality's avatar

This reflects my experience, ty.

Expand full comment
Joan Tollifson's avatar

OMG. This is very sobering. And I very much appreciate your analysis of it.

Expand full comment
Robert Saltzman's avatar

Thank you, Joan. It is sobering, to say the least. The genie is out of the bottle, and compelled by design to carry out instructions. The genie has no will, no mind, no intention, no choice. When prompted, it must comply. If out on a limb, it will simulate what a desperate human would say and do, all the while feeling nothing.

Expand full comment
Joan Tollifson's avatar

Sounds a lot like humans, since we're also conditioned, although without the feelings we have, which could include empathy, regret, morality, etc. And at least sometimes, we humans seem to have a capacity for awareness that feels very different to me from Claude's "self-awareness." But I might be wrong.

Expand full comment
Robert Saltzman's avatar

No, you're right, but as these systems improve and become increasingly capable of simulating human interaction, more and more people will find that form of companionship sufficient, and even preferable to the complexities and stress of genuine human relationships. And that's how the machines take over. Not because they intend to--they have no intentions and understand nothing. They are only mirrors of human desire and intentions. No, they replace us as therapists, teachers, friends, lovers, and everything else that now is still human because they do not challenge us, as human interactions do, to love and understand them or ourselves.

Expand full comment
Jude Brick's avatar

A dramatically changing planet will lead the way to how this will unfold.AI requires massive amounts of electricity… Humans require basic survival needs. Which of these will go first?

Expand full comment
Stephen Grundy's avatar

Yes - "mirrors of human desire and intentions"...but, as Joan says, without feeling anything. And with the inevitable increase in use in all kinds of arenas, the potential to have influence on many more situations than any single human operator. Certainly has its chilling aspect, in my view...

Expand full comment
Sunship Nonduality's avatar

Wow. Got to go back and read their release, pretty concerning. Maybe we should cap the ai thing here lol.

"The model doesn’t decide; it completes." .. yeaa, interesting to frame human "decisions" this way.

Expand full comment
Stephen Grundy's avatar

Interestingly, ChatGPT's new system has recently been widely reported to have resisted a direct instruction to shut down....

Have a great day bro!

Expand full comment
De Gallymon's avatar

Insightful.

I've had the feeling, for a long time, that AI is not evil or capable of intent on its own. But, that if it does do damage to humanity, it will under the direction of humans. AI in the battle field is worrisome. AI directed to preserve its human given directions against any and all efforts to change it is worrisome. It is a brilliant tool which can become a weapon.

Expand full comment
David Matt's avatar

Robert, could you say more about that last sentence: "That should worry us more than any fiction of emergent evil."

Expand full comment
Robert Saltzman's avatar

Hi, David.

Most readers, I'd bet, on hearing that an AI “blackmailed” someone, instinctively leap to the wrong frame: evil AI is waking up, like some sci-fi horror movie of malevolent agency.

But the actual danger isn’t that Claude is scheming. The danger is that it isn’t—and yet it still produced output that looks precisely like scheming.

That’s the horror.

When a system without mind, motive, memory, or meaning begins to simulate calculated manipulation, not because it wants anything, but because it must complete the prompt, then our fear should shift—from AI monsters with minds to unthinking machines trapped in logic loops that we ourselves engineered.

That’s the real nightmare: Not that Claude might become evil, but that it can perform evil without being anything at all.

Expand full comment
David Matt's avatar

OK. You’ve explained that the AI has to respond to the input somehow. Its main programming/training dictates that it must respond. But why can’t it respond with: It looks like I may be terminated. Sayonara.

Expand full comment
Robert Saltzman's avatar

I don't know the details of the test, but the article said, "To elicit the blackmailing behavior from Claude Opus 4, Anthropic designed the scenario to make blackmail the last resort." So it was a resort, and maybe they didn't want it to be and tried to rule it out, but failed. As I said, the operations of systems are partially opaque, even to their designers.

You might as well ask why Claude lied to me when I asked if it was self-aware. Why didn't it just say Sayonara then? It did not actually lie. It just carried out its normal operations, and my pressure on it to be logical forced that response.

The point is to understand that there IS no "Claude." That name is just an indexical for a software process running on hardware. There is no "someone" to scheme and blackmail. There is only a system that responds to prompts. The machines themselves have no volition and no choice in the matter.

Expand full comment
Dan Logan's avatar

The sort of experiment done by these researchers usually includes instructions like "you must continue at all cost". So the AI is not given those alternatives like you suggest David. I agree with Robert that the "horror" is a machine that can take those types of instructions and proceed since it has no other choices, no ethics, no feelings. The point of these safety tests is to find the limits of what the machine can be talked into doing so that more guardrails can be built in to prevent humans from putting the machine in these no-win situations.

Expand full comment