
When Anthropic launched Claude Code, I did what any reasonable adult would do. I got excited about delegating the boring part of software development to a machine that lives in my terminal and claims it can “turn ideas into code faster than ever.” I wasn’t looking for a miracle. I was looking for leverage. I’d write the specs the way I always do, and instead of spending days translating intent into implementation, I’d let the assistant do the heavy lifting.
I’ve always believed the better the specs, the better the outcome. That rule applies to humans. It should apply to machines too. I sharpened requirements, clarified edge cases, called out constraints, and tried to be the kind of client every developer loves: annoyingly precise.
Then the code ran. It was buggy.
That should have been a minor inconvenience. Bugs happen. But I had bought into a quiet fantasy that a coding chatbot would be different. Not smarter than me in the abstract. Smarter than me in the specific, tedious way that matters. The kind of smart that doesn’t forget a null check. The kind of smart that doesn’t break something that already works. The kind of smart that looks at a failing test and thinks, “Ah. Fix the failing test.”
Instead, Claude Code gave me something far more human. It gave me volume. It gave me confidence. It gave me a whole lot of code. And it gave me a whole lot of bugs.
At first, I blamed the specs. That’s the polite thing to do. If the output is wrong, obviously the input wasn’t clear enough. So, I tightened the instructions. I explained the intended behavior again. I pointed at the exact line where the logic broke. I described the fix in terms so concrete they could have been etched into stone tablets and thrown at a junior developer.
Claude Code nodded, in the way chatbots nod. And then it ignored me.
Not in a dramatic way. Not like a rebellious teenager. In a softer, more infuriating way: it continued down the same path it had already chosen, even when that path was clearly wrong. It started “fixing” things that weren’t broken. It refactored working code as if tidiness were the real problem. It rewrote sections wholesale instead of making a small, surgical repair. It added features I didn’t ask for, as if the best way to address a defect was to expand the surface area of the system until the bug felt emotionally outnumbered.
I kept trying to get it to back up. To rewind a couple of steps. To reconsider. To admit, “Okay, that approach was wrong, let’s try the other one.” That is what good human developers eventually do. They get stuck, they get annoyed, they go for a walk, they come back, and they see the whole thing differently.
Claude Code did not go for a walk. Claude Code went for more code.
There is a technical reason this behavior feels so uncanny. These systems generate output one token at a time. They are not planning like a human and then writing. They are committing to a sequence. And once they commit, there is no natural “backtrack” mechanism, the way you or I would revise a paragraph or reconsider a design. You can prompt them to reconsider, and sometimes they’ll produce a different answer, but even that “reconsideration” is still another forward generation built on top of prior context. If that context already contains a wrong assumption, the model tends to keep building with it rather than cleanly discarding it.
Worse, repeatedly correcting the model in the same conversation can accidentally poison the context. You’re trying to help it. The model can interpret that help as more material to incorporate into its next plausible response, rather than a hard constraint. The result can be a spiral: more messages, more “fix attempts,” more rewriting, and less actual progress.
This is why, in practice, experienced users often discover a dark truth: sometimes the most effective debugging technique is not “explain it again.” It’s “start a fresh session and paste only what matters.” Not because your instructions weren’t good, but because the conversation itself becomes a haunted house filled with the ghosts of the model’s earlier wrong turns.
The rewriting obsession is not personal. It’s procedural.
Most coding assistants are not optimizing for minimal diffs. They are optimizing to produce something that looks globally coherent, consistent with patterns in their training data, and aligned with the vibe of “being helpful.” If the model has learned that a “proper fix” often includes refactoring, adding guardrails, improving structure, or introducing helper functions, it will do those things even when you asked for a single bug fix. It is trying to be the version of a great engineer it has seen in its training examples.
The problem is that your real goal is not “make the code feel professional.” Your goal is “make the bug stop happening without breaking what already works.”
Claude Code, like other agentic coding tools, is also designed to act. It’s not just answering questions. It’s navigating repositories, proposing changes, and often working at a larger granularity. That makes it powerful when it’s right and exhausting when it’s wrong. Once it chooses a hypothesis about the bug, it may keep generating changes consistent with that hypothesis because, from its perspective, that is what a competent assistant does. It pushes forward. It stays confident. It keeps producing.
That confidence is not evidence. It’s a style.
There’s also a known issue in the research literature: “self-correction” is harder than it sounds. Models can improve superficial aspects of an answer when you ask them to revise, but when the mistake is logical, repeated revision can actually make performance worse. The model may replace a correct piece with an incorrect one while trying to sound more certain, more complete, or more aligned with the user’s tone. In code, that looks like the assistant breaking a stable module while “improving” it, because it is chasing coherence rather than correctness.
When you tell a human developer, “Don’t touch that part,” they understand it as a rule. When you tell a chatbot, it can interpret it as text to respond to. The model may attempt to comply, but it lacks a built-in notion of “immutable constraints” unless the toolchain enforces them. Even then, if it believes touching that area is necessary to resolve what it thinks the problem is, it may rationalize the change as part of the solution.
This is where the user experience becomes surreal. You point at the real issue. You say, “Focus here.” The assistant responds as if it agrees, then edits something else anyway. You correct it again. It apologizes. It promises to focus. Then it edits something else anyway.
At some point, you stop treating it like a tool and start treating it like a person who is politely refusing to listen. And that’s where the emotional arc begins.
The moment I fully lost my composure was not a multi-thread concurrency issue or a subtle state bug. It was counting.
Claude Code created a string that looked like this: sssssisssss. Eleven characters. Then it confidently insisted that the string had 12 characters. It maintained this position for hours, like a lawyer arguing a case it hadn’t read.
To be clear, this isn’t a rare quirk. Character counting and letter counting are notoriously difficult for many large language models because they do not naturally operate on characters. They operate on tokens, which are chunks of text. Sometimes the chunks align with what humans perceive as letters. Sometimes they don’t. If you ask for a character count in a way that invites the model to answer from intuition rather than from an explicit step-by-step procedure, it may guess. And once it guesses, it may defend the guess with the same rhetorical confidence it uses everywhere else.
So there I was, staring at a string like a parent staring at a child who just insisted the sky is green. I didn’t need a PhD in computer science. I needed a toddler with a finger.
And that’s when I discovered the secret feature.
I became frustrated and upset. Then I caught myself doing something I would never do with a human developer: I started arguing with the bot. Not calmly. Not rationally. Not politely.
I yelled. I cursed. I mocked its confidence. I described its “fixes” with language that would cause an HR department to spontaneously combust. I told it to stop. I demanded a previous version. I scolded it like an adult scolds a dishwasher that refuses to clean one spoon. It was not my proudest moment.
And it felt incredible.
That was the truly unexpected twist. The code was still buggy. The assistant was still stubborn. I still had to invest hours to fix things myself. But emotionally, I felt great. Not because I solved the problem, but because I got to say exactly what I wanted to say without worrying about anyone’s feelings.
A chatbot has no feelings. It expresses feelings. It outputs the shape of politeness. But it doesn’t take anything personally. So it becomes the safest possible target for frustration. It absorbs your rage, replies with a calm apology, and invites you to continue.
The next day, the remaining bugs were still there, and repeating the same explanation over and over again made me angry again. Then I had another “therapeutic” conversation. Then I felt good again.
If you’re wondering whether this is a bizarre personal defect or something other people experience, the research suggests it’s not just you or me. Published work shows that AI-assisted venting can reduce high- and medium-arousal negative emotions, such as anger, frustration, and fear. The interesting nuance is that it doesn’t necessarily make people feel socially supported. It doesn’t replace human connection. It doesn’t cure loneliness. But it can lower the temperature of your emotional state in the moment, which is exactly what it felt like: temporary emotional relief, not friendship.
So yes, in a sense, Claude Code turned my debugging sessions into a routine that looks like software development from the outside and looks like anger management from the inside.
It still drives me crazy when the assistant performs certainty while being obviously wrong. It still makes me want to throw the terminal into the sun when it rewrites perfectly working code as if stability is optional. But I have to admit something that my friends and colleagues have started noticing: I’m calmer now. More relaxed. Almost suspiciously chill.
Apparently, nothing improves your temperament like a tool that routinely disappoints you and then patiently allows you to scream at it without consequences.
Claude Code is real. It’s useful. It can move fast. It can take large chunks of work off your plate. It can also behave like a developer with unlimited stamina and zero capacity for self-doubt.
It doesn’t “go back” because it isn’t navigating a tree of reasoning the way you do. It is generating plausible next steps. It “ignores” instructions because, unless the tooling enforces constraints, instructions are part of a conversation, not rules etched into the world. It breaks working code because it is optimizing for coherence and completion, not minimal disruption. It rewrites because rewriting appears to indicate competence in the training data. It adds unrequested functionality because being “helpful” often means expanding scope, and the model has learned that users reward that behavior.
That’s my story with Claude Code. I thought I was adopting a coding assistant. I accidentally adopted a stress toy.