I noticed that a site I use regularly for work (sparkbyexamples.com) recently started requiring pressing a button "Expand to Show Full Article" before you get to see any code. It's not a pay wall; you still get the content for free. My guess is that this is designed to prevent their data from being scraped for training LLMs. Makes sense, they provide quality content for free, why should another company make money from it? I also read someone's theory that Reddit changed the API pricing for precisely the same reason.
It is only a small step from making people press "Expand to Show Full Article" to charging them 1 sat to push the same button. Just a thought...
Here's another thought: Hit the "contact us" button and suggest the idea to them.
I do this with sites regularly. Most don't take to the idea, at least not right away, and that's fine.
But in this case, charging a few sats for content access is a genius-level counter to mass LLM scraping.
reply
Have you ever convinced anybody? I guess it's worth a try..
@k00b @kr @ekzyis did you guys ever consider doing something like that here? Charge sats to access content via an API. "Real people" can sign up for an account but a scraper would likely move on.
reply
My opinion on this is (I don't speak for the others) that trying to differentiate between scrapers and humans doesn't work on scale. For example, the example using the "button" to show the full content. I see two options how this is implemented:
a) the content is already available in the HTML in it's raw form but the button only obfuscates it using Javascript b) it's actually doing another HTTP request.
If it's obfuscated, you are fighting an uphill battle against companies which have a lot of incentive to mine as much data as possible. If it's just another HTTP request, it's as simple as doing another HTTP request.
I may be wrong how much effort these companies would do to get access to as much (raw) data as possible but so far my opinion is that it's not worth to come up with solutions to distinguish humans and bots on the internet.
Everyone has to pay the same amount. If you want more data, you need to pay more.
Regarding charging sats for an API: We already kind of do that since you have to pay to comment etc.
But I guess you mean to charge more sats for API access.
Then it comes back down to if the AI companies think the effort to have easier access to the data is worthwhile or if they just try to scrape the data without using a dedicated API (regardless of the consequences for the infrastructure of the site).
So basically it comes down to me disagreeing with this:
but a scraper would likely move on.
reply
Regarding charging sats for an API: We already kind of do that since you have to pay to comment etc.
But I guess you mean to charge more sats for API access.
Oh, I need to learn to read more properly before answering ... you mentioned about charging sats to access/view content:
Charge sats to access content via an API
I think we didn't have this idea (yet?) but it comes back down to if it's effective against scrapers or if they wouldn't just pretend to be a human and get the access for free
reply
Okay, well what if you just charged everyone for access? Both read and write. If you're an active participant on SN, you can afford it. Pretty much anyone who posts a bio here gets at least a few hundred sats.
The idea makes it difficult for people to lurk before signing up (I did that for a while), but you can solve that with differential pricing for subs. Some subs can be completely open, others much more exclusive. Or, you can charge based on the age of a post.
reply
Even 21 sats (I think this is a nice symbolic number for digital self-defense) would cripple LLM scraping at scale. I think we have an existential necessity to move cyberspace away from the old freemium model, and even free access.
Corporations will continue to exploit for data training all the content we put out here, constituting yet another form of energy extraction.
Meanwhile, for the individual user, the fractions of a penny we're talking about are trivially accessible.
reply
For example, the example using the "button" to show the full content.
I don't know why I put "button" in quotation marks
reply
But in this case, charging a few sats for content access is a genius-level counter to mass LLM scraping.
And I think the only viable one
reply
For anyone looking for a lightning paywall platform, checkout WordForm - https://wordform.space
reply
Yes, this AI thing is definitely causing paywall creation... another big opportunity for the flow of sats
reply