pull down to refresh

The real problem at hand is how much trust people place in the answers they receive when working with an LLM. Maybe the best outcome is that we all get seeded with a very strong distrust of LLM outputs -- at least enough trust to check our answers once in a while.
I think that the outrage is an important counterweight to the exaggerated claims from all the LLM bosses. They just spent billions towards something both great (from a big data aggregation / achievement perspective) and mediocre (from a usability / advertised use-case fitting perspective) at the same time, and they need to reinforce the success to get even more billions to improve the latter by any means possible.
Because both traditional news and social media is saturated with the billionaires and not the boring real research, or even the yolo resulting in "hey we found something interesting" "research", the world only gets to hear the banter. I'd suggest that the outrage is even too little because which player has been decimated thus far? None. They all get billions more and then thus far, they spend it on the next model that is still mediocre, because there are no real breakthroughs (also see #1020821).
If more weight were to be given to what goes wrong, the money will potentially be spent on real improvement, not more tuning and reiterations with more input. As long as that's not the case and large scale parasitic chatbot corporations can continue to iterate on subprime results, we'll be stuck with hallucinating fake-it-till-you-make-it AI that is not fit for purpose.
Outrage focused on these <pick your model> was naughty outputs doesn't seem as valuable to me as outrage that the model just made up a line and added it to my dataset. Just, we don't find the latter kind of mistakes outrageous.
Some people may want their llm to talk about hitler a certain way, and others may want it to always use inclusive language, but I assume that almost everybody wants the model to not invent things without telling us.
The morality hype may put pressure on the big players, but it's not necessarily pressure to make their models more reliable or more useful. It may just be pressure to make their models insipid when dealing with certain topics.
reply
That's a good point. What would be the right kind of outrage/pressure?
reply
I was thinking about this post about trust in LLMs when it comes to the code in pacemakers. The author ends with this post script:
as I was writing it I discovered that I am truly horrified that my car's breaks will be programmed by a contractor using some local 7b model that specializes in writing MIRSA C:2023 ASIL-D compliant software.
Outrage based on fuck ups that kill/harm people is already here -- but maybe we can expand that base to include less catastrophic outcomes: "No, I'm not using a model that gets basic details wrong."
reply
98 sats \ 1 reply \ @carter 23h
we need to be able to tune it to our preference
reply
Yes! NPU farm is on my xmas wishlist.
reply