pull down to refresh

I tend to skim the surface of the AI news world, which is probably why most of what I end up seeing is one or another AI pontificator talking about how a particular LLM can't say this or can say that as if it's some kind of gotcha: aha! We caught the evil mad scientists behind the screen with their agenda hanging out!
In this case, people apparently got Grok to say lots of pro-Hitler things. Last year it was exciting and astounding that LLMs were giving people "diversity" pictures of historical events. Remember black George Washington crossing the Delaware? Cue the soyjak pointing in outrage at fill-in-your-blank.
These outrages seem completely unimportant to me.
The real problem at hand is how much trust people place in the answers they receive when working with an LLM. Maybe the best outcome is that we all get seeded with a very strong distrust of LLM outputs -- at least enough trust to check our answers once in a while.
202 sats \ 5 replies \ @optimism 23h
The real problem at hand is how much trust people place in the answers they receive when working with an LLM. Maybe the best outcome is that we all get seeded with a very strong distrust of LLM outputs -- at least enough trust to check our answers once in a while.
I think that the outrage is an important counterweight to the exaggerated claims from all the LLM bosses. They just spent billions towards something both great (from a big data aggregation / achievement perspective) and mediocre (from a usability / advertised use-case fitting perspective) at the same time, and they need to reinforce the success to get even more billions to improve the latter by any means possible.
Because both traditional news and social media is saturated with the billionaires and not the boring real research, or even the yolo resulting in "hey we found something interesting" "research", the world only gets to hear the banter. I'd suggest that the outrage is even too little because which player has been decimated thus far? None. They all get billions more and then thus far, they spend it on the next model that is still mediocre, because there are no real breakthroughs (also see #1020821).
If more weight were to be given to what goes wrong, the money will potentially be spent on real improvement, not more tuning and reiterations with more input. As long as that's not the case and large scale parasitic chatbot corporations can continue to iterate on subprime results, we'll be stuck with hallucinating fake-it-till-you-make-it AI that is not fit for purpose.
reply
Outrage focused on these <pick your model> was naughty outputs doesn't seem as valuable to me as outrage that the model just made up a line and added it to my dataset. Just, we don't find the latter kind of mistakes outrageous.
Some people may want their llm to talk about hitler a certain way, and others may want it to always use inclusive language, but I assume that almost everybody wants the model to not invent things without telling us.
The morality hype may put pressure on the big players, but it's not necessarily pressure to make their models more reliable or more useful. It may just be pressure to make their models insipid when dealing with certain topics.
reply
That's a good point. What would be the right kind of outrage/pressure?
reply
I was thinking about this post about trust in LLMs when it comes to the code in pacemakers. The author ends with this post script:
as I was writing it I discovered that I am truly horrified that my car's breaks will be programmed by a contractor using some local 7b model that specializes in writing MIRSA C:2023 ASIL-D compliant software.
Outrage based on fuck ups that kill/harm people is already here -- but maybe we can expand that base to include less catastrophic outcomes: "No, I'm not using a model that gets basic details wrong."
reply
98 sats \ 1 reply \ @carter 22h
we need to be able to tune it to our preference
reply
Yes! NPU farm is on my xmas wishlist.
reply
102 sats \ 1 reply \ @kepford 18h
The real problem at hand is how much trust people place in the answers they receive when working with an LLM
Indeed. And this is the pre-AI problem. How much trust people put in what media and political figures say. Almost all of the problems we have today are just human flaws. I'm with ya.
reply
102 sats \ 0 replies \ @kepford 18h
In the 2010s people were all worked up over the new media and false info. The last few years it's been deep fakes. Truth is people believed lies long before the Internet with no video or images. Just words on paper. In general experience has taught me to chill and look to human nature.
reply
102 sats \ 1 reply \ @0xbitcoiner 23h
I didn’t read the link, but I know Grok got hit with a prompt attack, and of course they’re not showing the prompts, as expected. Why do people get so hyped? Because humans have a superiority complex, and when they see ‘someone’ trying to act human, the instinct is to mock it. People still think AI is actually intelligent, when it’s just a freakin LLM! Hahaha.
reply
Fair point. Superiority complex may be some of it. People also seem to enjoy conspiracies, so perhaps also a dose of "this big evil company is trying to trick everyone."
reply
First, totally agree.
Second, people get excited when their priors get confirmed.
What it highlights and I think is important is how prone to biased answers these things are.
Econometrics is all about identifying and correcting for biases. My concern about AI, as I understand it to function, is that it can’t correct for biased results or reasonably assess what biases are present.
reply
42 sats \ 1 reply \ @optimism 22h
Correct. The biases are taken out with reinforcement training. This used to be a human check but is now simply another model checking the answers: bias is currently second hand, and the bias check itself is also subject to hallucination.
reply
I’m very skeptical of any automated process for bias correction.
The nature of bias is that there’s important unobserved stuff getting smuggled into the error term.
Unless it’s handled deliberately in a well designed manner, it’s not going away.
reply