Only now they've agreed to pay Reddit for it. This is what their third party lockdown was really all about.
They're helping themselves to your Lemmy comments for free, as that's just how it's designed. If you post anything publicly anywhere, it's getting slurped up by a bot somewhere.
I'm not a lawyer. But isn't the reason they had to go to reddit to get permission is because users hand over over ownership to reddit the moment you post. And since there's no such clause on Lemmy, they'd have to ask the actual authors of the comments for permission instead?
Mind you, I understand there's no technical limitation that prevents bots from harvesting the data, I'm talking about the legality. After all, public does not equate public domain.
Well even if it was a legal argument, they wouldn't care. Like Facebook and all the rest. They say they don't share your data but we all know that's a lie
users hand over over ownership to reddit the moment you post
Not ownership. Just permission to copy and distribute freely. Which basically is necessary to run a service like this, where user-submitted content is displayed.
And since there's no such clause on Lemmy, they'd have to ask the actual authors of the comments for permission instead?
It's more of a fuzzy area, but simply by posting on a federated service you're agreeing to let that service copy and display your comments, and sync with other servers/instances to copy and display your comments to their users. It's baked into the protocol, that your content will be copied automatically all over the internet.
Does that imply a license to let software be run on that text? Does it matter what the software does with it, like display the content in a third party Mobile app? What about when it engages in text to speech or braille conversion for accessibility? Or index the page for a search engine? Does AI training make any difference at that point?
The fact is, these services have APIs, and the APIs allow for the efficient copying and ingest of the user-created information, with metadata about it, at scale. From a technical perspective obviously scraping is easy. But from a copyright perspective submitting your content into that technical reality is implicit permission to copy, maybe even for things like AI training.
Reddit banned me through IP address or something. Whatever new account i create will be banned within 24hrs even if i don't upvote a single post or comment.
I tried with 10 new account all banned and all new email address.
So gave up and randomly changed all my good comments.
Shifted permanently to lemmy. Missing some of the most niche community. But not so much to return to reddit.
Edit: I didn't even commit any rule violation. Took a too long to change from modded reddit app. I only logged in once. That doesn't amount to blocking me from every using reddit.
I didn't delete my comments before nuking my account, but I'm pretty sure the grand majority were shitposts containing ample amounts of smut, gore and other ridiculous over the top shit. So I consider this a win.
All my Reddit comments have just said “Comment redacted in protest against Reddit's deranged attacks against third party apps, the community, and common sense. See you'll in Lemmy or Kbin once this embarrassment of a site is done enshittifying itself out of existence. Monetize this, u/spez, you greedy little pigboy. 🖕” since I edited them before moving here. 🤷♂️
I replaced all my comments with the same phrase before deleting them with PowerDeleteSuite. The comments were fully restored and visible through a google search (but not visible through the user page). My posts were not restored, AFAIK.
This was during the whole 3rd party API thing. Maybe it was just something done during that time, but they certainly got around the edit replacement trick before.
This form of propaganda is my pet peeve. It's not "your posts" as soon as you put something to public you don't get to eat your cake. It's out there, you shared it. Don't share it if you don't want humanity to ingest and use it.
It's not about it being used to train AI. It's about the AI either not being open source/I don't get access to it (i.e. not benefitting me) or reddit being paid for my comments (i e. also not benefitting me).
If this AI training would get me or the public access to the AI, or I would be paid for my comments instead of Reddit, I'd be fine with it.
yeah but you don't get to choose that. You give away that right as soon as you participate in public discourse. It's a zero sum game - either it's a public for everyone or no one.
Don't get me wrong, Reddit is a bitch but I think people want to cut their noses off to spite their faces here. It's much more important to have free information flow than to fuck reddit.
My fear is that people will vote in some really dumb rules to spite AI and restrict free information flow accidentally.
That's how it is currently and maybe also your opinion. But that doesn't mean it has to be like that in a society. It's your opinion that everything public can go private at any time (training proprietary private AI), but we can decide as a society that's not how we want to do things. We can require stuff that used public data to be public as well.
And yeah I kinda get to choose that. As democratic society, anything that the public (i.e. including me) decides, goes. Of course, if there are people like you that don't want stuff trained on public data to be required to be public, democracy will also work in the sense that we don't get that, as it is currently.
What makes you think that they are not scraping Lemmy too? The only reason they might not be is probably how niche Lemmy and the fediverse are, but I am sure there have been people already doing it.
I'm sure they are, but Reddit probably provides these companies with lots of personalized metadata they collect just for them which they may not get from Lemmy.
Fediverse is designed to do exactly that. It's free flow of information which is a good thing. Don't let corporations hijack this beautiful concept. We all want information to be free.
I’m not mad about the scraping. The linkedin scraping case pretty much cemented that there was nothing that could be done to stop it. I’m just mad that I can no longer use the app of my choice. No such problem with Lemmy.
Scraping through a website at the scale they are talking about isn't really viable. You need access to the API so that you can have very targeted requests.
This is why reddit changed their API pricing and screwed over everyone using third party apps. They can make more money selling access to LLM trainers than they could from having millions of people using apps that rely on the API.
Scraping at scale is actually cheaper than buying API access. It's a massive rising market, try googling "web scraping service" and there are hundreds of services that provide API to scrape any public web page and bypass the blocks for you and render all of the javascript.
Scraping ia nice for static conten, no doubt. But I wonder at what point it is easier to request changes to a developing thread via API than to request the whole page with all nested content over and over to find the new answes in there.
Following a developing thread is a very tiny use case I'd imagine and even then you can just scrape the backend API that is used on the public page for the same results as private API.
There's actually legal precedent against scrapping a website through unofficial channels, even if the information is public.
But basically, if you scrape a website and hinder their ability to operate, it falls under "virtual trespassing".
I'm assuming it would be even worse now that everyone is using the cloud and that scrapping their site would cause a noticeable increase in resource cost (and thus, directly cost them more money because of cloud usage fees).
It's why APIs are such a big deal. They provide you with an official, controlled, entry point to a platform's data.
It's the opposite!
There's legal precedence that scraping public data is 100% legal in the US.
There are few countries where scraping is illegal though like Japan and China. European countries often also have things called "database protection" laws that forbid replicating public databases through scraping or any other means but that has to be a big chunk of overal database. Also there are personally identifiable info (PII) protection laws that protect storing of people data without their consent (like GDPR).
Source: I work with anti bot tech and we have to explain this to almost every customer who wants to "sue the web scrapers" that lol if Linkedin couldn't do it, you're not sueing anyone.
Refreshing to see a post on this topic that has its facts straight.
EU copyright allows a machine-readable opt-out from AI training (unless it's for scientific purposes). I guess that's behind these deals. It means they will have to pay off Reddit and the other platforms for access to the EU market. Or more accurately, EU customers will have to pay Reddit and the other platforms for access to AIs.