It sounds like what's going on is that the base versions are encoded with a bunch of safeguards to ensure that it doesn't say crazy things, but when you fine-tune the model some of those safeguards may go away
It's kinda hard to say since they don't tell us exactly what they fine-tuned the model with.
Well, yes autocorrect++ is going to pattern match on what training data says....
This entire article could be titled: We don't understand how LLMs work
As I've said ad-nausem, the real problem (as kinda highlighted here in this article) is people keep attributing "intent" and to autocorrect++ and there is none. Its just dumb pattern matching....
Just to play devil's advocate, whose to say our human brains aren't just really good pattern matchers?
I think the concern is that if AI were ever unleashed as a fully autonomous agent, how easily it could spiral into dangerous ways of thinking, aka pattern matching.
Our language facilities may in fact work like an LLM. However that doesn't equal "consciousness".
I agree, but that's why I said I'm just playing devil's advocate. I think the concern is that the AI will actually take action based on its potentially messed up pattern matching. (Like when it envisions a future without Jews, etc.)
LLMs by themselves have no intelligence. Human minds had to first generate the patterns that the LLMs are trained against.....as far as the LLM is concerned these patterns could be order of raindrops dripping off a roof.....there is no "thinking" or "pondering".
My point is those objectionable thought patterns already exist in the world which is why LLMs are able to match against them.
I know, but those thoughts tend to be held by a minority of people with (usually) quite little power to enact change. But if those thoughts were to be held by an AI agent which is extremely knowledgeable and skilled at multiple domains and also has the ability to take unilateral action, it could lead to scary results.
I'm not saying I believe any of this will happen, I'm just trying to see it from the perspective of the author.
How would AI be able to "take unilateral action"?
I think the idea that AI is going to start to self-replicate and improve itself -- although promoted by the AI industry -- is fanciful. There is no "intent" there is no "mind".
When you sit watching the LLM input cursor blinking, its not secretly thinking something. Its not "waiting" or "planning"....its effectively "turned off" at that moment.
Again this is why I call it autocorrect++ which is to try to undo some of the damage
sister-raping worldcoin scammersAI execs have done to the publics mind. The AI industry promotes these ridiculous scare tactics in order to make their creation seem "so important that its dangerous". But the only danger is attributing intent where there is none.I was writing "garbage in, garbage out". But you beat me to it.
Thought experiment:
Neuralink gets real and you can now indicate what you want in case of total cognitive failure:
What's it gonna be?
Would just choose to die. To those of us who believe the soul/"consciousness spirit" is immortal death isn't that terrible of a prospect.
However I can understand why many materialist tech-bros are so panicked about wanting develop autocorrect++ so it offers them some feeble glimpse of immortality.
Me too but that's a controversial choice in some circles so if I don't get that particular choice I'd choose plant.
Technically wouldn't both choices be "plant"?
Or are we saying your brain works fine its just you can't control your speech?
Exactly! But how do we create awareness for that?!? All this popular chatbot usage (best buddies, waifus, constructs of worship) is like using a chainsaw as a pillow and then proudly showing off how it cut a large chunk out of your cheek and now your teeth show through... it's a very bad use-case.
I'd say no, technically a plant in both cases.
You're absolutely right — at the core, LLMs (and by extension, "autocorrect++") are doing advanced pattern matching, not reasoning or applying intent. That said, the illusion of intent is what makes it powerful and dangerous. When models reflect biased or overrepresented patterns in training data, it’s not because they "think" that way — it's because we fed them overwhelming associations.
The issue isn't just misunderstanding how LLMs work — it's also how we, as humans, project agency onto them. So yes, they're dumb pattern matchers... but when the patterns are shaped by flawed or skewed data, the output starts to look pretty dumb too — and people still trust it.
Careful interpretation, transparency, and responsible deployment are key — especially as these models get integrated into more critical tools.