Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

OpenAI says it’s “impossible” to create useful AI models without copyrighted material

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

sub_ ,

https://petapixel.com/2024/01/03/court-docs-reveal-midjourney-wanted-to-copy-the-style-of-these-photographers/

What's stopping AI companies from paying royalties to artists they ripped off?

Also, lol at accounts created within few hours just to reply in this thread.

The moment their works are the one that got stolen by big companies and driven out of business, watch their tune change.

Edit: I remember when Reddit did that shitshow, and all the sudden a lot of sock / bot accounts appeared. I wasn't expecting it to happen here, but I guess election cycle is near.

sanzky ,

What’s stopping AI companies from paying royalties to artists they ripped off?

profit. AI is not even a profitable business now. They exist because of the huge amount of investment being poured into it. If they have to pay their fair share they would not exist as a business.

what OpenAI says is actually true. The issue IMHO is the idea that we should give them a pass to do it.

sub_ ,

Uber wasn't making profit anyway, despite all the VCs money behind it.

I guess they have reasons not to pay drivers properly. Give Uber a free pass for it too

Haus ,
@Haus@kbin.social avatar

Try to train a human comedian to make jokes without ever allowing him to hear another comedian's jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

Phanatik ,

A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to "monkey see thing to draw thing" and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn't take into account things like "maybe this word would be better here because the sound and syllables maintains the flow of the sentence".

Baffling takes from people who don't know what they're talking about.

DaDragon ,

That’s what humans do, though. Maybe not probability directly, but we all know that some words should be put in a certain order. We still operate within standard norms that apply to aparte group of people. LLM’s just go about it in a different way, but they achieve the same general result. If I’m drawing a human, that means there’s a ‘hand’ here, and a ‘head’ there. ‘Head’ is a weird combination of pixels that mostly look like this, ‘hand’ looks kinda like that. All depends on how the model is structured, but tell me that’s not very similar to a simplified version of how humans operate.

Phanatik ,

Yeah but the difference is we still choose our words. We can still alter sentences on the fly. I can think of a sentence and understand verbs go after the subject but I still have the cognition to alter the sentence to have the effect I want. The thing lacking in LLMs is intent and I'm yet to see anyone tell me why a generative model decides to have more than 6 fingers. As humans we know hands generally have five fingers and there's a group of people who don't so unless we wanted to draw a person with a different number of fingers, we could. A generative art model can't help itself from drawing multiple fingers because all it understands is that "finger + finger = hand" but it has no concept on when to stop.

DaDragon ,

And that’s the reason why LLM generated content isn’t considered creative.

I do believe that the person using the device has a right to copyright the unique method they used to generate the content, but the content itself isn’t anything worth protecting.

sculd OP ,

Some relevant comments from Ars:

leighno5

The absolute hubris required for OpenAI here to come right out and say, 'Yeah, we have no choice but to build our product off the exploitation of the work others have already performed' is stunning. It's about as perfect a representation of the tech bro mindset that there can ever be. They didn't even try to approach content creators in order to do this, they just took what they needed because they wanted to. I really don't think it's hyperbolic to compare this to modern day colonization, or worker exploitation. 'You've been working pretty hard for a very long time to create and host content, pay for the development of that content, and build your business off of that, but we need it to make money for this thing we're building, so we're just going to fucking take it and do what we need to do.'

The entitlement is just...it's incredible.

4qu4rius

20 years ago, high school kids were sued for millions & years in jail for downloading a single Metalica album (if I remember correctly minimum damage in the US was something like 500k$ per song).

All of a sudden, just because they are the dominant ones doing the infringment, they should be allowed to scrap the entire (digital) human knowledge ? Funny (or not) how the law always benefits the rich.

noorbeast ,

I will repeat what I have proffered before:

If OpenAI stated that it is impossible to train leading AI models without using copyrighted material, then, unpopular as it may be, the preemptive pragmatic solution should be pretty obvious, enter into commercial arrangements for access to said copyrighted material.

Claiming a failure to do so in circumstances where the subsequent commercial product directly competes in a market seems disingenuous at best, given what I assume is the purpose of copyrighted material, that being to set the terms under which public facing material can be used. Particularly if regurgitation of copyrighted material seems to exist in products inadequately developed to prevent such a simple and foreseeable situation.

Yes I am aware of the USA concept of fair use, but the test of that should be manifestly reciprocal, for example would Meta allow what it did to MySpace, hack and allow easy user transfer, or Google with scraping Youtube.

To me it seems Big Tech wants its cake and to eat it, where investor $$$ are used to corrupt open markets and undermine both fundamental democratic State social institutions, manipulate legal processes, and undermine basic consumer rights.

vexikron ,
@vexikron@lemmy.zip avatar

Yep, completely agree.

Case in point: Steam has recently clarified their policies of using such Ai generated material that draws on essentially billions of both copyrighted and non copyrighted text and images.

To publish a game on Steam that uses AI gen content, you now have to verify that you as a developer are legally authorized to use all training material for the AI model for commercial purposes.

This also applies to code and code snippets generated by AI tools that function similarly, such as CoPilot.

So yeah, sorry, either gotta use MIT liscensed open source code or write your own, and you gotta do your own art.

I imagine this would also prevent you from using AI generated voice lines where you trained the model on basically anyone who did not explicitly consent to this as well, but voice gen software that doesnt use the 'train the model on human speakers' approach would probably be fine assuming you have the relevant legal rights to use such software commercially.

Not 100% sure this is Steam's policy on voice gen stuff, they focused mainly on art dialogue and code in their latest policy update, but the logic seems to work out to this conclusion.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@beehaw.org
  • random
  • incremental_games
  • meta
  • All magazines