I called this shit out like a year ago. It's the end of any viable online searching having much truth to it. All we'll have left is youtube videos from project farm to trust.
It kinda seems like the end of the Google era. What will we search Google for when the results are all crap? This is the death gasps of the internet I/we grew up with.
Remember when you could type a vague plot of a film you’d heard about into Google and it’d be the first result?
I honestly don't remember this at all. I remember priding myself on my "google-fu" and how to search it to get what i, or other people, needed. Which usually required understanding the precise language that you would need to use, not something vague. But over the years it's gotten harder and harder, and now I get frustrated with how hard it has become to find something useful. I've had to go back to finding places I trust for information and looking through them.
Although, ironically, I can do what you're talking about with ai now.
Cause in my early childhood in 2003-2007 we would resort to search engines only when we couldn't find something by better (but more manual and social) means.
Because - mwahahaha - most of the results were machine-generated crap.
So I actually feel very uplift due to people promising the Web to get back to norm in this sense.
I ran into this issue while researching standing desks recently. There are very few places on the internet where you can find verifiably human-written comparisons between standing desk brands. Comments on Reddit all seem to be written by bots or people affiliated with the brands. Luckily I managed to find a YouTube reviewer who did some real comparisons.
Is it really a solution, though, or is it just GIGO?
For example, GPT-4 is about as biased as the medical literature it was trained on, not less biased than its training input, and thereby more inaccurate than humans:
All the latest models are trained on synthetic data generated on got4. Even the newer versions of gpt4. Openai realized it too late and had to edit their license after Claude was launched. Human generated data could only get us so far, recent phi 3 models which managed to perform very very well for their respective size (3b parameters) can only achieve this feat because of synthetic data generated by AI.
I didn't read the paper you mentioned, but recent LLM have progressed a lot in not just benchmarks but also when evaluated by real humans.
Here's my prediction. Over the next couple decades the internet is going to be so saturated with fake shit and fake people, it'll become impossible to use effectively, like cable television. After this happens for a while, someone is going to create a fast private internet, like a whole new protocol, and it's going to require ID verification (fortunately automated by AI) to use. Your name, age, and country and state are all public to everybody else and embedded into the protocol.
The new 'humans only' internet will be the new streaming and eventually it'll take over the web (until they eventually figure out how to ruin that too). In the meantime, they'll continue to exploit the infested hellscape internet because everybody's grandma and grampa are still on it.
Yup. I have my own prediction - that humanity will finally understand the wisdom of PGP web of trust, and using that for friend-to-friend networks over Internet. After all, you can exchange public keys via scanning QR codes, it's very intuitive now.
That would be cool. No bots. Unfortunately, corps, govs and other such mythical demons really want to be able to automate influencing public opinion. So this won't happen until the potential of the Web for such influence is sucked dry. That is, until nobody in their right mind would use it.
That sounds very reasonable as a prediction. I could see it being a pretty interesting black mirror episode. I would love it to stay as fiction though.
The problem is the magnitude, but yeah, even before 2020 Google was becoming shit and being overrun by shitty blogspam trying to sell you stuff with articles clearly written by machines. The only difference is that it was easier to spot and harder to do. But they did it anyway
These things became shit around 2009. Or immediately after becoming sufficiently popular to press out LiveJournal and other such (the original Web 2.0, or maybe Web 1.9 one should call them) platforms.
What does this have to do with search engines - well, when they existed alongside web directories and other alternative, more social and manual ways of finding information, you'd just go to that if search engines would become too direct in promotion and hiding what they don't want you to see. You'd be able to compare one to another and feel that Google works bad in this case. You wouldn't be influenced in the end result.
Now when what Google gives you became the criterion for what you're supposed to associate with such a request, and same for social media, then it was decided.
This is a direct consequence of Google targeting Reddit posts in its search results. Hopefully forum groups like Lemmy don't go get buried under a mountain of garbage as well. As long as advertisers are able to destroy public forums and communities with ads, with ad based revenue sites like Google directing who to target. We will always be creating something great while constantly trying to keep advertisers from turning it into a pile of crap.
The history of TV, in reverse. And then forward again.
At first, it was an impossibly expensive medium rules by a cartel of agencies and advertisers. Eventually, HBO comes along and shows you don't have to just make a bunch of lowest common denominator drivel.
Netflix eventually shows that the internet can be a way cheaper model than cable. Finally, money shows up in the streaming model, remaking advertiser friendly cable in the internet age. All in about 2.5 decades.
It's gross, but also inevitable. If there's an untapped niche to make money from, somebody's going to try it -- plus if they want to waste their money on generating accounts only to have them be banned, then so be it.
Makes me kinda thankful that this community is smaller and less likely to be targeted by this sort of crap.
What's funny is I think it would be profitable for maybe, like, a year, before everyone starts doing it and then even normal people stop trusting reddit comments.
It's like pissing in a pool to sell people soap. What's the plan once people stop using the pool?
In this case it's creating a kind of anti-value - harm, I guess.
Also I bow to your superior and brazen use of mixed metaphors. You got double what I did. "Bleeding" a cow dry? It adds impact over the usual "milking" even!
Ai is a tool. It can be used for good and it can be used for poison. Just because you see it being used for poison more often doesn't mean you should be against ai. Maybe lay the blame on the people using it for poison
Yeah, I've noticed that a bit lately anyways. Maybe I'm looking up stuff that has less of a community on Reddit, and thus has less discussion, but I have absolutely noticed some comments have a single product name-drop with little clarity for why they liked the product. It starts to feel like they're just ads (generated or otherwise) meant to trick you into thinking Reddit users are liking the product.
AI is going to just make it worse, and cause Reddit to not be a good goto for actual reviews and discussion on pros/cons.
There's an excellent chance that even some of the "authentic" discussions you see are word-for-word reposts of old posts and comments, created by bots to build up karma in order to be sold to spammers and influence peddlers down the line.
The first obvious wave of this stuff, to me, was the video conversion ripoff software and similar. They had people looking around for questions their software was possibly a solution for. Sometimes they would act like users, other times it was more neutral info, but still clear it was self promotion because of what was recommended.
I wanted to figure out what game hosting sites were good and Google pointed me to reddit...every thread was full of boilerplate ads for different sites. The comments were the most obvious, marketing-approved sentences I've ever seen
Everything I can find online seems to be advertisements or paid reviews (Also advertisements) when looking for anything anymore. Businesses are terrified of an open honest conversation about what is good and what is not
The only thing we reasonably have is security through obscurity. We are something bigger than a forum but smaller than Reddit, in terms of active user size. If such a thing were to happen here, mods could handle it more easily probably (like when we had the spammer of the Japanese text back then), but if it were to happen on a larger scale than what we have it would be harder to deal with.
I think the real danger here is subtlety. What happens when somebody asks for recommendations on a printer, or complains about their printer being bad, and all of a sudden some long established account recommends a product they've been happy with for years. And it turns out it's just an AI bot shilling for brother.
For one, well established brands have less incentives to engage in this.
Second, in this example, the account in question being a "long established user" would seem to indicate you think these spam companies are going to be playing a long game. They won't. That's too much effort and too expensive. They will do all of this on the cheap, and it will be very obvious.
This is not some sophisticated infiltration operation with cutting edge AI. This is just auto generated spam in a new upgraded form. We will learn to catch it, like we've learned to catch it before.
I mean, it doesn't have to be expensive. And also doesn't have to be particularly cutting edge. Start throwing some credits into an LLM API, haven't randomly read and help people out in different groups. Once it reaches some amount of reputation have it quietly shill for them. Pull out posts that contain keywords. Have the AI consume the posts and figure out if they have to do with what they sound like they do. Have it subtly do product placement. None of this is particularly difficult or groundbreaking. But it could help shape our buying habits.
There's one advantage on the fediverse. We don't have the corporations like reddit manipulating our feeds, censoring what they dislike, and promoting shit. This alone makes using the fediverse worth for me.
When it comes to problems involving the users themselves, things aren't that different, and we don't have much to do.
they can perhaps create instances, pay malicious users, try some embrace, extend, extinguish approach or something, but they can't manipulate the code running on the instances we use, so they can't have direct power over it. Or am I missing something? I'm new to the fediverse.
There's very little to prevent them just pretending to be average users and very little preventing someone from just signing up a bunch of separate accounts to a bunch of separate instances.
No great automated way to tell whether someone is here legitimately.
Federation means if you are federated then sure you get some BS. Otherwise, business as usual. Now, making sure there is no paid user or corporate bot is another matter entirely since it relies on instance moderators.
We don't have the corporations like reddit manipulating our feeds, censoring what they dislike, and promoting shit.
Corporations aren't the only ones with incentives to do that. Reddit was very hands off for a good long while, but don't expect that same neutral mentality from fediverse admins.
I kind of feel like the opposite, for a lot of instances, 'mods' are just a few guys who check in sporadically whereas larger companies can mobilize full teams in times of crisis, it might take them a bit of time to spin things up, but there are existing processes to handle it.
If a community is so small that the mod team can be so inactive, there's no incentive for the company to put any effort into spamming it like you're suggesting.
And if they do end up getting a shit ton of spam in there, and it sits around for a bit until a moderator checks in, so what? They'll just clean it up and keep going.
I'm not sure why people are so worried about this. It's been possible for bad actors to overrun small communities with automated junk for a very long time, across many different platforms, some that predate Reddit. It just gets cleaned up and things keep going.
It's not like if they get some AI produced garbage into your community, it infects it like a virus that cannot be expelled.
Some will get through and sit for a few days but eventually the account will make itself obvious and get removed.
It's not exactly difficult to spot these things. If an account is spending the majority of its existence on a social media site talking about products, even if they add some AI generated bullshit here and there to make it seem like it's a regular person, it's still pretty obvious.
If the account seems to show up pretty regularly in threads to suggest the same things, there's an indicator right there.
Hell, you can effectively bait them by making a post asking for suggestions on things.
They also just tend to have pretty predictable styles of speak, and never fail to post the URL with their suggestion.
Yep, it's sort of what google used to be. It took me a bit of setup tho. They really like to default to showing you a ton of news and crap. But after turning that all off I'm left with a super clean ui and useful search results
I do kind of feel like this part of the experiment might just be coming to a close.
There's no "if AI just keeps getting more insidious", the barrier for entry is too small. AI is going to keep doing the things it's already doing, just more efficiently, and it doesn't matter that much how we feel about whether those things are good or bad. I feel like the things it is starting to ruin are probably just going to be ruined.
Peer-to-peer systems? Systems where you have to do physically be at the location to get data maybe, so cyber cafe like things. Or back to the old system and go to the regular bars, repair cafés or hobby places.