That post reminded me that lemmee exists. Accounts didn't work that great when I first got here but I made one today and got verified. Logged out of Reddit for the last time and replaced my comments. Eff that place right in it's a-hole. Good riddance.
Nothing, but the lemmy admins can't be the only one's profiting from it. Reddit killed 3rd apps and academic research so they could be the sole profiteers of the user data.
I don't miss the dipshits, pun spammers, and smug power mods of reddit at all. I do miss their niche subs and smarter users. Like it or not, they do have some brainy folks peppered among the shit posters.
We have some good folks here, too. Just need more of them.
It's a shame reddit has been dialing up the shit faucet slowly enough that most of their users don't notice how awful it is now. They've grown accustomed to the poor quality of the content and weaponized greed of the owners.
In all honesty, when I joined Reddit right after digg went to shit. It was amazing. Reddit was great, 3rd party apps were welcome, their interface was straightforward, and they had none of those NFT gold shit.
I joined maybe 6 years ago, and there was a bit of shit talking and most posts had a troll answer hitting the most votes for some reason, but it was usually pretty good to scroll straight past and find some really insightful comments. There was a lot of good stuff around reddit, but slowly the absurb number of awards, NFT avatars, reposts, and ads every third post started to corrupt it. It was simple enough to switch to a third party app for quite a while, but the garbage slowly took over.
Even if they hadn't pulled 3rd party apps, it was getting pretty close a point where it wasn't worth scrolling past the bullshit.
At that point, they were also open source which was super cool. I always wanted that profile badge you got for submitting a merged PR.
Reddit really went downhill fast after ~2015. I think Lemmy will get there eventually. I remember reddit being a lot smaller back then as well. It took a while to get to the point where niche communities could thrive and I do believe we'll see that happen here as well (even if it takes a decade or so)
Oh they're here too. They're not causing too much drama because there's not enough going on, but they're here. Some of them are admins of certain instances.
The ones that aren't here yet will eventually find their way here when Lemmy continues to grow. And the most concerning thing about that is how many more tools Lemmy is providing them to fuck with users.
on the other hand, if there's troves of free data, that takes the upper hand from the companies that can afford paying for it, and gives open source a much better chance at staying competitive.
Hm but don't you automatically own the stuff you create yourself, as long as you don't consent to giving it away? I don't know the terms and conditions of my Lemmy instance though.
When was the last time anyone read the T&Cs of a social media website?
They basically all have a clause to the effect that you grant them a permanent, irrevocable license do whatever they want with anything you post.
You might still own the copyright to any content you produce, but by posting you’re granting them permission to do basically anything with it, including reselling it.
Well there's copyright law. There's already lawsuits happening so we'll have to see how this shakes out.
But even if the AI companies lose the lawsuits, I think it's likely they'll still have access to content where the T&C of the site says they're allowed to sell the data.
Yes but i think reddit is many times more valuable than Lemmy. I just haven't found the same level of very specific subreddits that have lots and lots of activity. Most of the traffic here is memes, politics, news and Linux lovin. On reddit if I needed to find a community about my local town it's no problem and there are tens or hundreds of daily posts. The same community does exist on Lemmy but the last post was 6 months ago.
I completely agree. There are lots of communities on Reddit that are missing on Lemmy. Have you tried posting your community? It might entice people to participate!
Yes, but I did not mean retroactively. Nor did I mean only on Reddit, by the way. However, making money from already published content is not what I have consented when I joined Reddit like 15 years ago.
You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
you agree that by posting messages, uploading files, inputting data, or engaging in any other form of communication with or through the Website, you grant us a royalty-free, perpetual, non-exclusive, unrestricted, worldwide license to use, reproduce, modify, adapt, translate, enhance, transmit, distribute, publicly perform, display, or sublicense any such communication in any medium (now in existence or hereinafter developed) and for any purpose, including commercial purposes, and to authorize others to do so.
Haven't dug up anything earlier than this, do you know of any?
Basically, you gave Reddit your approval long ago.
Yes I did, but it is not clear if these are enforceable in court, when they give us read those multi page agreements that most people skip. More over AI like today did not exist and one can easily argue that that agreement does not cover data use for AI like chatGPT, since neither of the side understood implications for that. It is like owning nukes is not covered by second amendment.
The important thing here IMO is not so much the enforceability as the intent. It was always obvious that Reddit would do whatever they wanted with the stuff we published there because they said they would do whatever they wanted with the stuff we published there. Personally, I knew this and just shrugged because it's no skin off my back if they do whatever they want with the stuff I published there - I was having fun posting, which was my goal. If they figured out some way to make those posts valuable then bully for them. They weren't otherwise valuable to me so it costs me nothing.
It's the same here on the Fediverse. When I post this stuff I'm tossing it out into the ether. It's on an open protocol intended to broadcast my comments to any compatible instances, so even if there isn't some literal terms of service that I signed that says "this content may show up on Threads or wherever" I know that it might show up on Threads or wherever. If I was truly fundamentally opposed to that then I wouldn't post.
As you could have guessed, I am on the same page with one exception (or addition) - I want my content to be used for free for AI training. My objection to Reddit agreement is that they want to paywall information needed for future progress.
Fortunately they may not really be able to. Reddit's comments and submissions are available here, and since this includes deleted content as well as the stuff that users have later edited away with scripts it may even be a better resource than what Reddit is offering itself. You'd need to train your AI in a legally permissive environment, of course, but there's places like that around the world and this is actually something that would advantage the "little guys" since they aren't as easy to target.
Who cares? Fuck reddit. Half the content is bots anyway. So, bots stealing content to train AI to make content, which the bots will steal and repost. Circle of death for reddit. Good luck with that IPO.
I stopped using reddit after they dropped the bomb on the devs and I'm not a fan of the company.
I understand the hatred towards them, but this is definitely expected from a company like reddit, and any other social media for that matter. As users we must be aware that we don't own the content in their platform.
I wouldn't be surprised if the same story comes from Instagram tomorrow, though I suppose there will be a bigger outcry then.
Don't know if it was against usage terms, but I have been able to get chatgpt answers written 'in the style of' various subreddits since the initial release (or perhaps the second release)
Honestly over the last year since the great migration, the discussions on lemmy have really grown and matured to the point where i don't really see the value of reddit anymore
The only use I have for Reddit anymore is for super niche information. For example we were planning to go to Six Flags Discovery Kingdom today but it's going to rain this afternoon. I checked their site and it said they were open 11-6, my BIL checked their app and at 11:30 it said they were currently closed. Found a Reddit post from someone confirming the park was closed for the weekend, and we didn't waste a trip up. (as an extra annoying aside, apparently this information was posted on Six Flag's Instagram page, because expecting a huge company to maintain a website is I guess just too much when they can offload it to social media.)
The real value of reddit for me lies in its cache of information contained in answers to questions from over the years. Whenever I'm looking online for a solution to a problem I'm trying to solve I'll eventually add "reddit" to the search and I almost always find the answer that way.
Damn. I keep meaning to use one of those things that deletes all your reddit data. I doubt it'll actually do anything (reddit has no ethical framework so they won't think twice about indexing "deleted" data) but I still need to do that.
I'd bet a year of my salary that it only deletes it from public view so people can no longer get helped from Reddit's Google search results, but a copy (or more than one copy) is still retained on their internal servers.
They were. One user got so upset he live-streamed himself individually deleting every post and comment he’d ever made. Reddit restored it all right after.
The trick is to turn everything into randomized garbage and then delete it later. A lot of those purge services offer that feature. It just swaps the words with others; so on the surface it looks like proper written text, but it makes absolutely no sense.
Aside from removing your content that they're profiting from, it also feeds AI scrapers pure garbage in the event that your content is restored.
Me, I'd prefer to fill it in with fake news. Let them train their bots on 'taylor swift is an alien psyop trained to infiltrate the highest levels of govt to fulfill the agenda of the radical left instellar warmongering fearlords ...'
Called this awhile back, this is why Reddit has such a high evaluation.
Poisoning your data won't do anything but give them more data, do you seriously think reddit servers don't track every edit you make to posts? You'd literally just be providing training data of original human vs poisoned. They'd still have your original post, and they have a copy of everytime you edit it.
Whoever buys reddit will have sole access to one of the larger (I don't think largest though) pools of text training Data on the internet, with full licensed usage of it. I expect someone like Google, FB, MS, OpenAI, etc would pay big $$$ for that.
"But can't people already scrape it?"
Well yes, but it's at best legally dubious in some places
Scraping Data off reddit only gets you current versions of posts (which means you can get poisoned dara, and cant see deleted content), and is extremely slow... if you own the server you have first class access to all posts in a database, including g the originals and diffs of everytime soneone edited a post, and all the deleted posts too.
Think about if you perhaps wanted to train an AI to detect posts that require flagging for moderation, if you scrape reddit data, you can't find deleted posts that got moderated...
But, if you have the raw original data, you 100% would have a list of every post that got deleted by mods and even the mod message on why it was deleted
You surely can see the value of such data, that only owners of reddit are currently privy to atm...
They've also got vote counts and breakdowns of who is making those votes. This data will be worth more for AI training than any similar volume of data other than maybe the contents of Wikipedia. Assuming they didn't have it set up to delete the vote breakdowns when they archived threads.
Why are those breakdowns worth so much? Because they can be used to build profiles on each voter (including those who only had lurker accounts to vote with), so they can build AIs that know how to speak with the MAGA cult, Republicans who aren't MAGA, liberals, moderates, centrists, socialists, communists, anarchists. Not only that, they'll be able to look at how sentiments about various things changed over time with each of these groups, watch people move from one to another as their opinions evolved, see how someone pretends to be a member of whatever group (assuming they voted honestly and posted under their fake persona).
Oh and also, all of that data is available through the fediverse but it's free to train on to anyone who sets up a server. Which makes me question whether the fediverse is a good thing because even changing federation to opt-in instead of opt-out just covers whether your server accepts data from another. It's always shared.
Open and private are on opposite sides of a spectrum. You can't have both, best you can do is settle for something in the middle.
What if reddit also kept all deleted comments and post, im sure there are shit loads of things people type out just to delete, thinking all the while it'll never see the light of day.
I'd be surprised if they don't keep all of that. There were a number of sites for looking at deleted posts. They'd just go and grab everything and compare what was still there with what wasn't and highlight the stuff that wasn't there anymore.
Which is also possible here, though the mod log reduces the need for it. But if someone is looking for posts people change their mind about wanting anyone to see, deleting it highlights it instead of hides it for anyone who is watching for that.
I think that site was unddit, but yes those were posted then later deleted. Im talking about just typing out a post or comment and never posting just simply backing out of the page or hitting cancel. Im not just if any of that is stored on the site or just locally.
You would be able to tell by monitoring the network tab of the browser developer tools. If post requests are being made (which they probably are, though I’m too lazy to go check) while you are typing a comment, they are most likely saving work in progress records for comments.
They definitely do, it's common for such systems to never actually delete anything because storage is cheap.
It likely just is flagged deleted=true and the searches just return WHERE [post].Deleted = False on queries on the backend.
So it looks deleted to the consumer, but it's all saved and squirreled away on the backend.
It's good to keep all this shit for both legal reasons (if someone posts illegal stuff then deletes it, you still can give it to the feds), as well as auditing (mods can't just delete stuff to cover it up, the original still exists and admins can see it)
Which makes me question whether the fediverse is a good thing
I'd argue it's good, because it means open source AI has a fighting chance with FOSS data to train on without needing to fork over a morbillion dollars to Reddits owners.
Whatever use cases the reddit data can train on, FOSS researchers can repeat it on Lemmy data and release free models that average joes can use on their own without having to subscribe to shit like Microsoft Copilot and friends to stay relevant.
The problem (for most) was never that people's public posts/comments were being used for AI training, it was that someone else was claiming ownership over them and being paid for access, and the resulting AI was privately owned. The fediverse was always about avoiding the pitfalls of private ownership, not privacy.
It's exhausting constantly being "that guy," but it really needs to be said constantly; private ownership is at the core of nearly every major issue in the 21st century.
The same goes for piracy and copyright. The same goes for DMCA circumvention and format shifting content you own. The same goes for proprietary tech ecosystems and walled gardens. Private ownership is at the core of the most contentious practices in the 21st century, and if we don't address it shit like this will just keep happening.
So the old trick of “search term +reddit” no longer will work then huh?
I’ve already made a habit of adding date limiters to web results from before before LLMs were made public… The SEO ‘optimization’ game of before was bearable, but the LLM spam just ruins so many search results with regurgitated garbage or teaspoon deep information
During the peak of the great purge, it was quickly becoming pointless. A lot of results were bringing up deleted posts. It took a while for search engines to catch up and start filtering a lot of those results out.
With respect to 2, it would stop others scrapping the content to train more open models on. This would essentially give Reddit exclusive access to the training data.
Sounds like something a bunch of
governments would be interested in. As you pointed out you get to see why human mods made certain decisions. Could you an edge in manipulation.
In regards to the editing part, sure, I'm sure they can track your edit history. However, on a large scale, most edits are going to be to correct things. To determine if an edit was to poison the text, it would likely require manual review and flagging. There's no way they're going to sift through all of the edits on individual accounts to determine this, so it's still worthwhile to do.
Although they could sidestep the issue a bit by simply comparing the changes between edits. Huge changes could just be discarded, while minor ones are fine.
If they hadn't applied the same charges to legitimate 3rd party applications they could still do this and have avoided the massive community backlash.
Considering their horrible track record with advertising and selling Reddit premium this should be the single best way for them to finally monetize their platform. They didn't need to destroy what little credibility they had remaining to their users to get to this point, but for whatever reason they did.
Not only did they have the option, as I understand it the API was even configured as such since all requests from an app shared the same API key. They're basically whitelisting like this now but only for the accessibility oriented 3rd party apps.