Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

@FaceDeer@kbin.social cover
@FaceDeer@kbin.social avatar

FaceDeer

@FaceDeer@kbin.social

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and is now exploring new vistas in social media.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

FaceDeer ,
@FaceDeer@kbin.social avatar

Especially given that this particular comment is 90% quotes from some other author.

FaceDeer ,
@FaceDeer@kbin.social avatar

I find a ton of uses for quick Python scripts hammered out with Bing Chat to get random stuff done.

It's also super useful when brainstorming and fleshing out stuff for the tabletop roleplaying games I run. Just bounce ideas off it, have it write monologues, etc.

FaceDeer ,
@FaceDeer@kbin.social avatar

We've got LLMs now that can do that. Sorry, you've been replaced. Please gather your things into this box and cheer up.

FaceDeer ,
@FaceDeer@kbin.social avatar

The article opens:

When I first started colorizing photos back in 2015, some of the reactions I got were, well, pretty intense. I remember people sending me these long, passionate emails, accusing me of falsifying and manipulating history.

So this is hardly an AI-specific issue. It's always been something to be on guard for. As others in this thread have pointed out, Stalin was airbrushing out political rivals from photos back in the 30s. Heck damnatio memoriae goes back as far as history itself does. Ancient Pharoahs would have the names of their predecessors chiseled off of monuments so they could "claim" them as their own work.

FaceDeer ,
@FaceDeer@kbin.social avatar

Better than having people get convicted based on fake evidence, though.

FaceDeer ,
@FaceDeer@kbin.social avatar

Yeah, but this doesn't put any restrictions on stuff, it just adds a label to it.

FaceDeer ,
@FaceDeer@kbin.social avatar

I find that often "movements" end up focused more on just continuing their movement rather than the underlying purpose of why they started moving in the first place.

FaceDeer ,
@FaceDeer@kbin.social avatar

If we don't have individual transportation how are we ever going to catch up to those goalposts?

FaceDeer ,
@FaceDeer@kbin.social avatar

Rare-earth element is a specific technical term. Lithium is absolutely not among them.

One of the main sources lithium is extracted from is brines. That is, it's already in the water and we take it out.

FaceDeer ,
@FaceDeer@kbin.social avatar

in the case of ai generated media, companies just decided that they just had the rights to use existing published media, so they harvested it without consent or compensation

Have you read the ToS of your favourite social media site lately?

In any event, it might well be that companies (and you yourself) have the rights to use existing published media to train AIs. Copyright doesn't cover the analysis of public data. I suspect that people wouldn't like it if copyright got extended to let IP owners prohibit you from learning from their stuff.

FaceDeer ,
@FaceDeer@kbin.social avatar

You mean before or after all the sites updated their ToS it so that they were legally in the clear to sell user posts to AI training companies?

The ToSes would generally have a blanket permission in them to license the data to third-party companies and whatnot. I went back through historical Reddit ToS versions a little while back and that was in there from the start.

Also in there was a clause allowing them to update their ToS, so even if the blanket permission wasn't there then it is now and you agreed to that too.

Learning from things is a very obviously a completely different process to feeding data into a server farm.

It is not very obviously different, as evidenced by the fact that it's still being argued. There are some legal cases before the courts that will clarify this in various jurisdictions but I'm not expecting them to rule against analysis of public data.

FaceDeer ,
@FaceDeer@kbin.social avatar

you know that a company putting a thing in their terms of service doesn't make it legally binding, right?

And you know that doesn't necessarily imply the reverse? Granting a site a license to use the stuff you post there is a pretty basic and reasonable thing to agree to in exchange for them letting you post stuff there in the first place.

hence why they all suddenly felt the need to update their terms of services

As others have been pointing out to you in this thread, that also is not a sign that the previous ToS didn't cover this. They're just being clearer about what they can do.

Go ahead and refrain from using their services if you don't agree to the terms under which they're offering those services. Nobody's forcing you.

FaceDeer ,
@FaceDeer@kbin.social avatar

"Prompt engineering" is simply the skill of knowing how to correctly ask for the thing that you want. Given that this is something that is in rare supply even when interacting with other humans, I don't see this going away until we're well past AGI and into ASI.

FaceDeer ,
@FaceDeer@kbin.social avatar

And while it's probably true that "we're not ready", we're never going to become ready until the tech actually arrives and forces us to do that.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed. Firefox already has "sponsored links" and such in the built-in homepage, I simply disable those when I first install it and get on with life.

Big projects like Firefox need big money to support it. If you don't want it to be beholden to Google it needs to find ways to earn some on its own.

FaceDeer ,
@FaceDeer@kbin.social avatar

It says "opt-out" in the title.

FaceDeer ,
@FaceDeer@kbin.social avatar

"They broke the law fair and square" is an odd defence.

FaceDeer ,
@FaceDeer@kbin.social avatar

You have misunderstood me. You said "Apple spent twenty years building the ecosystem Spotify and Epic want to exploit for free." I'm pointing out that the amount of effort Apple put into building the ecosystem is immaterial to whether they're doing illegal things with it.

FaceDeer ,
@FaceDeer@kbin.social avatar

No, we're describing a human endeavour. If the promotional flyers had been made by outsourcing it to Fiverr and they came back wonky it would have been the same basic problem. They outsourced this and then ether didn't have the resources or interest in checking the work that came back.

FaceDeer ,
@FaceDeer@kbin.social avatar

You missed "techbro grifter scam" from your list of buzzwords.

FaceDeer ,
@FaceDeer@kbin.social avatar

This article is from June 12, 2023. That's practically stone-aged as far as AI technology has been progressing.

The paper it's based on used a very simplistic approach, training AIs purely on the outputs of its previous "generation." Turns out that's not a realistic real-world scenario, though. In reality AIs can be trained on a mixture of human-generated and AI-generated content and it can actually turn out better than training on human-generated content alone. AI-generated content can be curated and custom-made to be better suited to training, and the human-generated stuff adds back in the edge cases that might disappear when doing repeated training generations.

Fanfiction Community Rocked By Etsy Sellers Turning Their Work Into Bound Books (www.404media.co)

Etsy sellers are turning free fanfiction into printed and bound physical books, and listing them for sale on online marketplaces for more than $100 per book. It’s a problem that’s rattling the authors of those fanfics, as well as their fans and readers....

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed, this is a common misunderstanding of the status of fanworks. Most fanfics likely violate the copyright of the IP they're based on, but that doesn't mean that they aren't themselves original copyrighted works. The original IP's rightsholders can't simply claim the fanfic's copyright for themselves. It likely means that each party would need the other party's permission to make legal copies of the fanfic.

This is why most studios or authors will refuse to even read unsolicited ideas that are sent to them, they don't want to end up in a bind if someone sends them a fanfic that's got elements in it that they already intended to use in future books or episodes and then sues them for "stealing" their work.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'm a big fan of fanfic, I support it and consider it a serious literary genre. It's basically the folklore of our modern times. I'm also not a fan of how extensive and restrictive copyright protection has become.

That said, I do find it amusingly ironic when fanfic authors get in a big huff about their copyright being violated.

FaceDeer ,
@FaceDeer@kbin.social avatar

Famously, "50 Shades of Grey" started out as a Twilight fanfic. The author later pulled out all of the Twilight-related stuff and then it was free and clear to publish as their own work. Given how much money 50 Shades raked in I would imagine there's been some legal scrutiny there from various sides.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed. I frequently use LLMs as brainstorming buddies while working on creative things, like RPG adventure planning and character creation. I want the AI to come up with new and unexpected things that never existed before.

If I have need of the AI to account for "ground truths" then I use things like retrieval-augmented generation or database plugins that inject that stuff into the context.

FaceDeer ,
@FaceDeer@kbin.social avatar

Have you not experimented with LLMs? They come up with new things all the time.

FaceDeer ,
@FaceDeer@kbin.social avatar

People's heights change over time too. Men and women can nevertheless have different average heights.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'd be very interested in those results too, though I'd want everyone to bear in mind the possibility that the brain could have many different "masculine" and "feminine" attributes that could be present in all sorts of mixtures when you range afield from whatever statistical clusterings there might be. I wouldn't want to see a situation where a transgender person is denied care because an AI "read" them as cisgender.

In another comment in this thread I mentioned how men and women have different average heights, that would be a good analogy. There are short men and tall women, so you shouldn't rely on just that.

FaceDeer ,
@FaceDeer@kbin.social avatar

That just makes my point stronger, though. The basic gist of what I was saying is that even if there is a statistical clustering of data into two groups that seem correlated with some category, that doesn't mean that you can absolutely rely on that data to classify people into those categories.

Tumblr and Wordpress to Sell Users’ Data to Train AI Tools (www.404media.co)

this could not be timed worse for Tumblr which is in huge hot water with its userbase already for its CEO breaking his sabbatical to ban a prominent trans user for allegedly threatening him (in a cartoonish manner), and then spending a week personally justifying it increasingly wildly across several platforms. the rumors had...

FaceDeer ,
@FaceDeer@kbin.social avatar

They're giving you services in exchange for your contents.

Does nobody even think about TOS any more? You don't have to read any specific one, just realize the basic universal truth that no website is going to accept your contents without some kind of legal protection that allows them to use that content.

FaceDeer ,
@FaceDeer@kbin.social avatar

Are you serious? We're speaking in the Fediverse right now. It's notable in its difference. Though instances have their own TOSes, so it'd be pretty trivial to set one up to harvest content for AI training as well.

FaceDeer ,
@FaceDeer@kbin.social avatar

Hardly. They earn money by being paid by their users, but they can earn more money by being paid by their users and also selling their users' data. The goal is more money, so it makes sense for them to do that. It's not crazy.

From the WordPress Terms of Service:

License. By uploading or sharing Content, you grant us a worldwide, royalty-free, transferable, sub-licensable, and non-exclusive license to use, reproduce, modify, distribute, adapt, publicly display, and publish the Content solely for the purpose of providing and improving our products and Services and promoting your website. This license also allows us to make any publicly-posted Content available to select third parties (through Firehose, for example) so that these third parties can analyze and distribute (but not publicly display) the Content through their services.

Emphasis added. They told you what they could do with the content you gave them, you just didn't listen.

I'm sorry if I'm coming across harsh here, but I'm seeing this same error being made over and over again. It's being made frequently right now thanks to the big shakeups happening in social media and the sudden rise of AI, but I've seen it sporadically over the decades that I've been online. So it bears driving home:

  • If you are about to give your content to a website, check their terms of service before you do to see if you're willing to agree to their terms, and if you don't agree to their terms then don't give your content to a website. It's true that some ToS clauses may not be legally enforceable, but are you willing to fight that in court? If you didn't consider your content valuable enough to spend the time checking the ToS when you posted it, that's not WordPress's fault.
  • If you give someone something and they later find a way to make the thing you gave them valuable, it's too late. You gave it to them. They don't owe you a "cut." Check the terms of service.
FaceDeer ,
@FaceDeer@kbin.social avatar

I wouldn't really trust that promise, frankly. I just checked their terms of service and it has the usual clause:

You must own all rights, title, and interest, including all intellectual property rights, in and to, the User Content you make available on the Services. ASSC requires licenses from you for that User Content to operate the Services. By posting User Content on the Services, you grant ASSC a royalty-free, perpetual, irrevocable, non-exclusive, sublicensable, worldwide license to use, reproduce, distribute, perform, publicly display or prepare derivative works of your User Content.

Which isn't really surprising, it's standard boilerplate for a reason. They don't want to be caught in a situation where they can't function legally any more. They say they won't sell the company or your data, and they might even believe that right now, but who knows what the future might bring? They have the ability to do so if the circumstances arise.

FaceDeer ,
@FaceDeer@kbin.social avatar

Well, a large part of my frustration stems from the "I've seen this for decades" part - longer than many of the people who are now raising a ruckus have been alive. So IMO it's always been this way and the "social contract we've adapted to" is "the social contract that we imagined existed despite there being ample evidence there was no such thing." I'm so tired of the surprised-pikachu reactions.

Combined with the selfish "wait a minute, the stuff I gave away for fun is worth money to someone else now? I want money too! Or I'm going to destroy my stuff so that nobody gets any value out of it!" Reactions, I find myself bizarrely ambivalent and not exactly on the side of the common man vs. the big evil corporations this time.

FaceDeer ,
@FaceDeer@kbin.social avatar

I'm just venting, really. I know it's not going to make a real difference.

I suppose if you go waaaay back it was different, true. Back in the days of Usenet (as a discussion forum rather than as the piracy filesharing system it's mostly used for nowadays) there weren't these sorts of ToS on it and everything got freely archived in numerous different places because that's just how it was. It was the first Fediverse, I suppose.

The ironic thing is that kbin.social's ToS has no "ownership" stuff in it either. For now, at least, the new ActivityPub-based Fediverse is in the same position that Usenet was - I assume a lot of the other instances also don't bother with much of a ToS and the posts get shared around beyond any one instance's control anyway. So maybe this grumpy old-timer may get to see a bit of the good old days return, for a little while. That'll be nice.

FaceDeer ,
@FaceDeer@kbin.social avatar

If it makes you feel better, the thing that annoys me most is not so much that this is happening but more how everybody is suddenly surprised by it and complaining about it. The data-harvesting itself doesn't really harm anyone.

FaceDeer ,
@FaceDeer@kbin.social avatar

A user's data still belongs to the user when they post it on sites like Reddit and such, too. The ToS doesn't take ownership away from them, at least not in any case that I've seen. It just gives the site the license to use it as well.

FaceDeer ,
@FaceDeer@kbin.social avatar

You could ask a lawyer, I suppose. But the basic gist of this is "we don't know what we might need to do with this data in the future, so we put 'we can do anything with this data' into the ToS so that we know that if the need arises we won't find ourselves unable to do what we need to do with it." Any website that doesn't do this could find itself unable to implement new features or comply with new laws they didn't think of when crafting the original ToS.

At the very minimum a ToS needs to have some way to update and apply retroactively to old data, which ends up being "we can do anything with this data" with extra steps.

FaceDeer ,
@FaceDeer@kbin.social avatar

No problem. I'm not a lawyer myself, mind you, but I've encountered issues like these enough times over the years that I feel I've got a pretty good layman's grasp. Plus I've actually read some of these ToSes and considered them from the perspective of the company running the site, which I suspect most people arguing about this stuff haven't actually done.

I wish the Fediverse sites running without rigorous ToSes well, of course, but I suspect failing to establish clear rights to use the content people post on them is likely to end up biting them in the long run. At least the bigger ones. Hobby-level websites get away with a lot because they don't have significant money on the line.

FaceDeer ,
@FaceDeer@kbin.social avatar

It's true, go ahead and read the ToS. It only grants a license to Reddit to use your content. It explicitly says:

You retain any ownership rights you have in Your Content, but you grant Reddit the following license to use that Content:

And then goes on to enumerate what you're licensing them to do with it. There's also a section titled "Changes to these Terms" about how they can change the ToS going forward.

FaceDeer ,
@FaceDeer@kbin.social avatar

I use quotation marks there because what is often referred to as AI today is not whatsoever what the term once described.

The field of AI has been around for decades and covers a wide range of technologies, many of them much "simpler" than the current crop of generative AI. What is often referred to as AI today is absolutely what the term once described, and still does describe.

What people seem to be conflating is the general term "AI" and the more specific "AGI", or Artificial General Intelligence. AGI is the stuff you see on Star Trek. Nobody is claiming that current LLMs are AGI, though they may be a significant step along the way to that.

I may be sounding nitpicky here, but this is the fundamental issue that the article is complaining about. People are not well educated about what AI actually is and what it's good at. It's good at a huge amount of stuff, it's really revolutionary, but it's not good at everything. It's not the fault of AI when people fail to grasp that, no more than it's the fault of the car when someone gets into it and then is annoyed it won't take them to the Moon.

FaceDeer ,
@FaceDeer@kbin.social avatar

I didn't say that everything in Star Trek was AGI, just that you can find examples there.

FaceDeer ,
@FaceDeer@kbin.social avatar

Not to mention that a response "containing" plagiarism is a pretty poorly defined criterion. The system being used here is proprietary so we don't even know how it works.

I went and looked at how low theater and such were and it's dramatic:

The lowest similarity scores appeared in theater (0.9%), humanities (2.8%) and English language (5.4%).

FaceDeer ,
@FaceDeer@kbin.social avatar

Article mentioned 400-word chunks, so much less than paper-sized.

FaceDeer ,
@FaceDeer@kbin.social avatar

Seems like it wouldn't really matter who he tested it on.

FaceDeer ,
@FaceDeer@kbin.social avatar

Call it whatever makes you feel happy, it is allowing me to accomplish things much more quickly and easily than working without it does.

FaceDeer ,
@FaceDeer@kbin.social avatar

Indeed, and many of the more advanced AI systems currently out there are already using LLMs as just one component. Retrieval-augmented generation, for example, adds a separate "memory" that gets searched and bits inserted into the context of the LLM when it's answering questions. LLMs have been trained to be able to call external APIs to do the things they're bad at, like math. The LLM is typically still the central "core" of the system, though; the other stuff is routine sorts of computer activities that we've already had a handle on for decades.

IMO it still boils down to a continuum. If there's an AI system that's got an LLM in it but also a Wolfram Alpha API and a websearch API and other such "helpers", then that system should be considered as a whole when asking how "intelligent" it is.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • incremental_games
  • meta
  • All magazines