pull down to refresh

Why does AI use something that human writers generally don’t?

69 sats \ 3 replies \ @adlai 8 Jan

training material isn't only comment sections of social media... print media are where the different dash types originated, and probably get weighed more authoritatively in lots of training datasets.

reply

I get that. I guess a different way to phrase it would be "Why were the training materials so biased towards use of this thing that it immediately became a signal of AI creation?"

reply
69 sats \ 1 reply \ @adlai 8 Jan

Honestly, I don't think it's a bad thing, and if I were employed in one of the providers, I would be lobbying at any opportunity to maintain the human-obvious fingerprint of LLM output... it's like finding a catchy name for a new soft drink that works well within the music of the local dialect, although is obviously a foreign word. It advertises itself.

reply

I don't really care one way or the other. I just find these methodological issues interesting.

reply

It was probably trained on "format" content that does use them

reply

I'm guessing that a lot of word processors like MS Word will autocorrect a single dash to an em-dash

reply