Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

@FaceDeer@fedia.io cover
@FaceDeer@fedia.io avatar

FaceDeer

@FaceDeer@fedia.io

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

FaceDeer ,
@FaceDeer@fedia.io avatar

NYT’s counsel Ian Crosby previously told Ars that OpenAI’s decision "to enter into deals with news publishers only confirms that they know their unauthorized use of copyrighted work is far from 'fair.'"

It means they know that they'll get harassed with lawsuits and are willing to spend some money to preemptively buy off potential litigants before it gets to that point. Whether that's "fair" is not really relevant to the realities of the legal system.

FaceDeer ,
@FaceDeer@fedia.io avatar

Since I find AIs to be useful that sounds fine to me.

Why Is There an AI Hype? | The Luddite (theluddite.org)

Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world. The rank-and-file developers at these companies, in their naivete, do not see that distinction....So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but...

FaceDeer ,
@FaceDeer@fedia.io avatar

Companies are training LLMs on all the data that they can find, but this data is not the world, but discourse about the world.

I mean, the same can be said for your own senses. "You" are actually just a couple of kilograms of pink jelly sealed in a bone shell, being stimulated by nerves that lead out to who knows what. Most likely your senses are giving you a reasonably accurate view of the world outside but who can really tell for sure?

So, as these LLMs become increasingly but asymptotically fluent, tantalizingly close to accuracy but ultimately incomplete, developers complain that they are short on data.

Don't let the perfect be the enemy of the good. If an LLM is able to get asymptotically close to accurate (for whatever measure of "accurate" you happen to be using) then that's really super darned good. Probably even good enough. You wouldn't throw out an AI translator or artist or writer just because there's one human out there that's "better" than it.

AI doesn't need to be "complete" for it to be incredible.

FaceDeer ,
@FaceDeer@fedia.io avatar

Indeed. I've never been to Australia. I've never even left the continent I was born on. I am reasonably sure it exists, though, based on all the second-hand data that I've seen. I even know a fair bit about stuff you can find there, like the Crow Fishers and the Bullet Farm and the Sugartown Cabaret.

FaceDeer ,
@FaceDeer@fedia.io avatar

Hah! Of course a female librarian would miscategorize a tome like that! Why was a woman placed in charge of books anyway? I suppose they need her to keep the shelves well-dusted and tidy?

(/s of course)

The ugly truth behind ChatGPT: AI is guzzling resources at planet-eating rates (www.theguardian.com)

Despite its name, the infrastructure used by the “cloud” accounts for more global greenhouse emissions than commercial flights. In 2018, for instance, the 5bn YouTube hits for the viral song Despacito used the same amount of energy it would take to heat 40,000 US homes annually....

FaceDeer ,
@FaceDeer@fedia.io avatar

Liters are a great unit for making small things seem large. I've seen articles breathlessly talking about how "almost 2000 liters of oil was spilled!" When 2000 liters could fit in the back of a pickup truck.

Water "consumption" is also a pretty easy to abuse term since water isn't really consumed, it can be recycled endlessly. Whether some particular water use is problematic depends very much on the local demands on the water system, and that can be accounted for quite simply by market means - charge data centers money for their water usage and they'll naturally move to where there's plenty of cheap water.

FaceDeer ,
@FaceDeer@fedia.io avatar

I'm Canadian. Milk comes in liters.

If you're saying that 2 cubic meters can't fit in the back of a pickup truck, here's some truck capacities. A cubic yard is 0.764555 cubic meters, so a full sized pickup can hold 3.4 cubic meters of cargo.

FaceDeer ,
@FaceDeer@fedia.io avatar

The problem is that the litigation was entirely "just", as far as the legal system goes. It's an open-and-shut case and everyone saw it coming. The Internet Archive basically stood in front of a train and dared it to turn, and now they're crying the victim. Doesn't exactly entice me to send them donations to cover their lawyers and executives right now.

They really need to admit "okay, so that was a dumb idea, and ultimately not related to archiving the Internet anyway. We're not going to do that again."

Note that I'm not saying the publishers are "good guys" here, I hate the existing copyright system and would love to see it contested. Just not by Internet Archive. Let someone else who's purpose is fighting those fights take it on and stick to preserving those precious archives out of harm's way.

FaceDeer ,
@FaceDeer@fedia.io avatar

The lawsuit was about them distributing unauthorized copies of books. Not archiving, and not internet pages or files.

And that was exactly the problem.

FaceDeer ,
@FaceDeer@fedia.io avatar

Did you read literally the next sentences I wrote after that one? Here they are:

Just not by Internet Archive. Let someone else who's purpose is fighting those fights take it on and stick to preserving those precious archives out of harm's way.

The Internet Archive is like someone carrying around a precious baby. The baby is an irreplaceable archive of historical data being preserved for posterity. I do not want them to go and fight with a bear, even if the bear is awful and needs to be fought. I want them to run away from the bear to protect the baby, while someone else fights the bear. Someone better equipped for bear-fighting, and who won't get that precious cargo destroyed in the process of fighting it.

FaceDeer ,
@FaceDeer@fedia.io avatar

doesn't change that IA is storing files, ebooks to be specific,

Emphasis added. Storing files is not the problem. Nobody cared when they were just scanning and storing them. The problem arose when they started giving out copies. And worse, giving out copies without restriction - libaries "lend" ebooks by using DRM systems to try to ensure that only a specific number of copies are out "in circulation" at any given time, and so the big publishers have turned a blind eye to that.

Internet Archive basically turned themselves into an ebook Pirate Bay, giving out as many copies as were asked for with no limits.

Again, I don't agree with current copyright laws, I think the big publishers are gigantic heaps of slime and should be burned to the ground. The problem here is that it's not Internet Archive that should be fighting this fight.

FaceDeer ,
@FaceDeer@fedia.io avatar

Who else is better equipped?

The EFF, for example. Fighting lawsuits for the sake of internet freedom is their reason for being. Sci-hub, for ebooks more specifically. Or Library Genesis. Those are organizations specifically devoted to fighting against excessive copyright restrictions on books.

Just because you perceive them as unworthy to bear the challenge

You're not understanding what I'm saying here. I don't think Internet Archive is unworthy to bear the challenge. I think they're not well suited to it, and when they inevitably lose the lawsuits they've jumped head-first into they're risking damage to other causes that are very important and unrelated to this particular fight.

FaceDeer ,
@FaceDeer@fedia.io avatar

This subthread switched specifically to the topic of their pending lawsuits, it's not about the DDoS. I doubt the publishers are behind this DDoS because they're already easily winning in the courts, there's absolutely no need for them to risk blowing their case and getting countersued this way.

FaceDeer ,
@FaceDeer@fedia.io avatar

It probably wouldn't help their current lawsuit, at this point. Maybe right at the beginning, before it went to court and they could negotiate a bit in search of a reasonable settlement, but at this point they've already lost it hard.

What it would do is reassure me that they're not going to do something dumb like this in the future, which would make me more willing to donate money to them knowing it'll go to actual internet archiving activities instead of being thrown into big publishers' pockets as part of more lawsuit settlements.

FaceDeer ,
@FaceDeer@fedia.io avatar

They're only at risk when they take risky behaviours. Simply archiving the Internet, like they've been doing for years, is not what they got sued over.

If they're going to keep doing the same thing they got sued over then they're going to keep losing court cases, because obviously they are. The definition of insanity is doing the same thing and expecting a different result. They should stop doing that.

FaceDeer ,
@FaceDeer@fedia.io avatar

I explained why not in the sentence directly following the one that you quoted. Here it is again:

Let someone else who's purpose is fighting those fights take it on and stick to preserving those precious archives out of harm's way.

To explain in more detail: The Internet Archive is custodian to an irreplaceable archive of Internet history and raw data. If they go and get themselves destroyed at the hands of book publishers fighting lawsuits over ebook piracy, that archive is at risk of being destroyed along with them. Or being sold off at whatever going-out-of-business sale they have, perhaps even to those very giant publishers that destroyed them.

That is why not them in particular. Let someone who isn't carrying around that precious archive go and get into fights like this.

FaceDeer ,
@FaceDeer@fedia.io avatar

Then the Internet Archive is being an idiot and risking a lawsuit. Again. They've already been raked over the coals for copyright violation, I guess they want to add libel to the list as well?

The Internet Archive has plenty of enemies, many of whom don't have an easy legal arsenal to throw at them like those big publishers did. The publishers have been playing smart so far and have won already through legal means, it makes no sense for them to suddenly turn stupid and launch this DDoS.

FaceDeer ,
@FaceDeer@fedia.io avatar

Unlimited copies, look it up. Internet Archive's "emergency library" broke the customary limits that other libraries stick to in order to keep publishers off their backs - they were giving out as many copies of a book at once as people were requesting, rather than keeping a limited number "in circulation."

It really was basically just a piracy site all of a sudden. It's absolutely no surprise at all that the publishers came down on them like a ton of bricks.

FaceDeer ,
@FaceDeer@fedia.io avatar
FaceDeer , (edited )
@FaceDeer@fedia.io avatar

Google actually was good, so there's probably some good information in this documentation. If nothing else we can perhaps figure out what "went wrong."

Edit: I've been reading the blog post that appears to be the main person the leak was shared with and there's a lot of in-depth analysis being done there, but I'm not seeing a link to the actual documents. This is a huge article, though, I might be overlooking it.

PayPal Is Planning an Ad Business Using Data on Its Millions of Shoppers (www.wsj.com)

Wall Street Journal (paywalled) The digital payments company plans to build an ad sales business around the reams of data it generates from tracking the purchases as well as the broader spending behaviors of millions of consumers who use its services, which include the more socially-enabled Venmo app....

FaceDeer ,
@FaceDeer@fedia.io avatar

This is indeed one of the things cryptocurrencies exist for, but social media denizens around these parts have long conditioned themselves to hate it.

So a rock and a hard place, it seems. Which is more hated; the big data-harvesting corporation co-founded by Elon Musk, or a big bad NFT-hosting blockchain?

For people who are concerned about data harvesting I would recommend something like Monero or Aztec over Bitcoin, though. Bitcoin's basically obsolete at this point, coasting on name recognition and inertia, and has no built-in privacy features.

FaceDeer ,
@FaceDeer@fedia.io avatar

That's because this isn't something coming from the AI itself. All the people blaming the AI or calling this a "hallucination" are misunderstanding the cause of the glue pizza thing.

The search result included a web page that suggested using glue. The AI was then told "write a summary of this search result", which it then correctly did.

Gemini operating on its own doesn't have that search result to go on, so no mention of glue.

FaceDeer ,
@FaceDeer@fedia.io avatar

The Fediverse doesn't have any defenses against AI impersonators though, aside from irrelevance. If it gets big the same incentives will come into play.

Reminder: The DMV uses photos for facial recognition

This is half a decade old news, but I only found this out myself after it accidentally came up in conversation at the DMV. The worker would not have informed me if it hadn't come into conversation. Every DMV photo in the United States is being used for AI facial recognition, and nobody has talked about it for years. This is...

FaceDeer ,
@FaceDeer@fedia.io avatar

Only those who don't care about privacy and use Windows.

So most people, then.

FaceDeer ,
@FaceDeer@fedia.io avatar

All industrial users pay lower, because they're able to apply economies of scale and locate themselves in places with lower power costs. Some of them are big enough that the utilities will build power lines and plants specifically to make electricity cheaper. It's not just a matter of "oh, they're rich, so we'll charge them less."

FaceDeer ,
@FaceDeer@fedia.io avatar

No significant blockchains use GPUs any more. As for AI training, that produces AIs. It's not wasteful.

FaceDeer ,
@FaceDeer@fedia.io avatar

Ah, some interesting technology news. Let's read about what new developments are being made-

Oh, wait, Elon Musk is involved. I hate technology! Emerald mines and cave rescue submarines!

FaceDeer ,
@FaceDeer@fedia.io avatar

I use AIs for a variety of productive purposes. You may not, and that's fine, but that's just you. You can't dismiss anything that you personally don't have a use for as "wasteful."

FaceDeer ,
@FaceDeer@fedia.io avatar

Like what's the problem being solved here?

Training AIs.

FaceDeer ,
@FaceDeer@fedia.io avatar

Ah yes, the wrong kind of technology.

If it's a "hype cycle" I guess it'll be going away aaaaaany day now.

FaceDeer ,
@FaceDeer@fedia.io avatar

Okay, so? NFTs aren't AI, and they don't use proof-of-work any more for that matter.

FaceDeer ,
@FaceDeer@fedia.io avatar

I wasn't. I pointed out that no significant blockchains used GPUs (especially not Ethereum, the main NFT-supporting blockchain, which has transitioned to proof-of-stake instead of proof-of-work). That puts them outside the question of "wastefulness" altogether, and irrelevant to the subject at hand.

FaceDeer ,
@FaceDeer@fedia.io avatar

I have no idea what point you're trying to make here. The comment I responded to said:

We really need to tax energy used by GPU-burning projects differently. AI training, blockchain, whatever. Such a wasteful endeavour.

And I pointed out that blockchain doesn't use GPUs any more. NFTs weren't even mentioned specifically. Then the thread went further into discussing AI specifically, not even blockchain at that point, and you jumped in to say "people still use nfts". It was almost a non-sequitur.

I'm not saying anything about NFTs. You don't need to jump in and "defend" them.

FaceDeer ,
@FaceDeer@fedia.io avatar

But NFTs aren't wasteful. They're run on a proof-of-stake blockchain, no big computing power is used to back them. Your point about NFTs is false, I didn't mention NFTs in the first place, I don't see the relevance of any of this.

FaceDeer ,
@FaceDeer@fedia.io avatar

What exactly are they "wasting?" Ethereum switched to proof-of-stake on 15 September 2022. If you are still criticizing NFTs for their environmental impact you're a year and a half out of date.

FaceDeer ,
@FaceDeer@fedia.io avatar

You didn't answer the question. What exactly are they wasting? And what does this have to do with AI at this point, anyway? You jumped in with this NFT thing and I still fail to see the relevance.

FaceDeer ,
@FaceDeer@fedia.io avatar

Oh no, they accommodated our desires and removed the requirement that we hated. The bastards.

ChatGPT Answers Programming Questions Incorrectly 52% of the Time: Study (gizmodo.com)

The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT....

FaceDeer ,
@FaceDeer@fedia.io avatar

No, they're useful because they produce useful machine code.

FaceDeer ,
@FaceDeer@fedia.io avatar

It's useful because it does the stuff we want it to do.

You're focusing on a very high level philosophical meaning of "usefulness." I'm focusing on what actually does what I need it to do.

FaceDeer ,
@FaceDeer@fedia.io avatar

So if something isn't perfect it's not "useful?"

I use LLMs when programming. Despite their imperfection they save me an enormous amount of time. I can confidently confirm that LLMs are useful from personal direct experience.

FaceDeer ,
@FaceDeer@fedia.io avatar

Yeah, things would be going so much better if garage hobbyists were developing these brain implants instead.

FaceDeer ,
@FaceDeer@fedia.io avatar

SpaceX has a 64% market share in the global commercial rocket launch market for sending satellites, scientific instruments, and other payloads into orbit. In the first six months of 2023, SpaceX handled 21 flights for outside customers, or 64% of the worldwide total. In the first half of 2023, SpaceX handled 88 percent of customer flights from U.S. launch sites.[1]

If success isn't their goal I'd be amazed at what they accomplished if the decided to try for it someday.

FaceDeer ,
@FaceDeer@fedia.io avatar

One of the common arguments I hear against technological advancement is "but what if some sociopath brews up a pandemic virus in their garage!"

The FDA is monitoring the corporations that are working on this sort of thing. As is mentioned in the title of this thread.

FaceDeer ,
@FaceDeer@fedia.io avatar

No, my example is literally telling the AI that socks are edible and then asking it for a recipe.

In your quoted text:

When a model is trained on data with source-reference (target) divergence, the model can be encouraged to generate text that is not necessarily grounded and not faithful to the provided source.

Emphasis added. The provided source in this case would be telling the AI that socks are edible, and so if it generates a recipe for how to cook socks the output is faithful to the provided source.

A hallucination is when you train the AI with a certain set of facts in its training data and then its output makes up new facts that were not in that training data. For example if I'd trained an AI on a bunch of recipes, none of which included socks, and then I asked it for a recipe and it gave me one with socks in it then that would be a hallucination. The sock recipe came out of nowhere, I didn't tell it to make it up, it didn't glean it from any other source.

In this specific case what's going on is that the user does a websearch for something, the search engine comes up with some web pages that it thinks are relevant, and then the content of those pages is shown to the AI and it is told "write a short summary of this material." When the content that the AI is being shown literally has a recipe for socks in it (or glue-based pizza sauce, in the real-life example that everyone's going on about) then the AI is not hallucinating when it gives you that recipe. It is generating a grounded and faithful summary of the information that it was provided with.

The problem is not the AI here. The problem is that you're giving it wrong information, and then blaming it when it accurately uses the information that it was given.

FaceDeer ,
@FaceDeer@fedia.io avatar

Wait... while true that that sounds like not hallucination then, what does that have to do with this discussion?

Because that's exactly what happened here. When someone Googles "how can I make my cheese stick to my pizza better?" Google does a web search that comes up with various relevant pages. One of the pages has some information in it that includes the suggestion to use glue in your pizza sauce. The Google Overview AI is then handed the text of that page and told "write a short summary of this information." And the Overview AI does so, accurately and without hallucination.

"Hallucination" is a technical term in LLM parliance. It means something specific, and the thing that's happening here does not fit that definition. So the fact that my socks example is not a hallucination is exactly my point. This is the same thing that's happening with Google Overview, which is also not a hallucination.

FaceDeer ,
@FaceDeer@fedia.io avatar

And humans aren't?

FaceDeer ,
@FaceDeer@fedia.io avatar

It's not, actually. Hallucinations are things that effectively "come out of nowhere", information that was not in the training material or the provided context. In this case Google Overview is presenting information that is indeed in the provided context. These aren't hallucinations, the AI is doing what it's being told to do. The problem is that Google isn't doing a good job of providing it with the right information to summarize.

My suspicion is that since Google is using this AI for all search results it's had to cut back the resources it's providing to each individual call, which means it's only being given a small amount of context to work from. Bing Chat does a much better job, but it's drawing from many more search results and is given the opportunity to say a lot more about them.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • incremental_games
  • meta
  • All magazines