In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

CosmoNova , 3 months ago

I almost want to believe they legitimately do not know nor care they‘re committing a gigantic data and labour heist but the truth is they know exactly what they‘re doing and they rub it under our noses.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

laxe , 3 months ago

Of course they know what they’re doing. Everybody knows this, how could they be the only ones that don’t?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Bogasse , 3 months ago

Yeah, the fact that AI progress just relies on "we will make so much money that no lawsuit will consequently alter our growth" is really infuriating. The fact that general audience apparently doesn't care is even more infuriating.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

Look guys! I'm stealing from Tolkien!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Guntrigger , 3 months ago

I don't think anyone's going to pay for your version of ChatGPT

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

toddestan , 3 months ago

I'd say not really, Tolkien was a writer, not an artist.

What you are doing is violating the trademark Middle-Earth Enterprises has on the Gandalf character.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

The point was that I absorbed that information to inform my "art", since we're equating training with stealing.

I guess this would have been a better example lol. It's clearly not Gandalf, but I wouldn't have ever come up with it if I hadn't seen that scene

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

redditReallySucks , 3 months ago

https://lemmy.dbzer0.com/pictrs/image/22e89104-6fa0-46fe-83ed-3485b69475ca.png

I hope this is gonna become a new meme template

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

driving_crooner , 3 months ago

She looks like she just talked to the waitress about a fake rule in eating nachos and got caught up by her date.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

bigMouthCommie , 3 months ago

this is incomprehensible to me. can you try it with two or three sentences?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

driving_crooner , 3 months ago

Her date was eating all the fully loaded nachos, so she went up and ask to the waitress to make up a rule about how one person cannot eat all the nacho with meat and cheese. But her date knew that rule was bullshit and called her out about it. She's trying to look confused and sad because they're going to be too soon for the movie.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

bigMouthCommie , 3 months ago

thank you. it must be a reference to something, but i don't watch tv any more.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

datavoid , 3 months ago

I think you should leave...

(is what you would search to find this)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

JWBananas , 3 months ago

I'm sorry, what does this have to do with Coffin Flops. Does this mean it isn't getting cancelled?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

swab148 , 3 months ago

I DIDN'T RIG SHIT!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RatsOffToYa , 3 months ago

Not sure what's funnier. your first comment or the comment explaining it to someone who obviously not part of a turbo team

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

fjordbasa , 3 months ago

Turbo team?? Did you replace my toilet with one that looks the same but has a joke hole? That’s just FOR FARTS??

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RatsOffToYa , 3 months ago

Look until you're part of the turbo team.... WALK SLOWLY

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

fjordbasa , 3 months ago

Fine… I’ll lay down to be by myself and read my art books!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

uninvitedguest , 3 months ago

What?! What the hell are you talking about?!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

squid_slime , 3 months ago

Chatgpt, you okay? 😅

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Thcdenton , 3 months ago

https://lemmy.world/pictrs/image/e7677152-0a06-4558-9752-c277d3392962.jpeg

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Plopp , 3 months ago

Lmao that's wonderful, scrolling down from those weird ass comments only to be greeted by my own exact facial expression.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Buttons , 3 months ago

"No... Hell no... Man, I believe you'd get your ass kicked if you said something like that..."

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

whoisearth , 3 months ago

Coffeezilla had a video in his void where he plays this back a few times. It's hilarious seeing the guilt without stating it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

andrew_bidlaw , 3 months ago

Funny she didn't talked it out with lawyers before that. That's a bad way to answer that.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

driving_crooner , 3 months ago

Or she talked and the lawyers told her to pretend ignorance.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

andrew_bidlaw , 3 months ago

Maybe, but it sounds very weak.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

anlumo , 3 months ago

Lawyers aren’t PR people.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

andrew_bidlaw , 3 months ago

She didn't even adress them though.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

QuaternionsRock , 3 months ago

It probably means that they don’t scrape and preprocess training data in house. She knows they get it from a garden variety of underpaid contractors, but she doesn’t know the specific data sources beyond the stipulations of the contract (“publicly available or licensed”), and she probably doesn’t even know that for certain.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

driving_crooner , 3 months ago

"Publicly a available" can mean a lot of things. Is youtube publicly available? Is public broadcasting publicly available?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

_haha_oh_wow_ , 3 months ago

Gee, seems like something a CTO would know. I'm sure she's not just lying, right?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Bogasse , 3 months ago

And on the other hand it is a very obvious question to expect. If you have something hide how on the world are you not prepared for this question !? 🤡

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hotzilla , 3 months ago

To be fair, these datasets are one of their biggest competitive edge. But saying in to interviewer "I cannot tell you", is not very nice, so you can take the americal politician approach and say "I don't know/remember" which you cannot ever be hold accountable for.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

VirtualOdour , 3 months ago

It's a question that is based on a purposeful misunderstanding of the technology, it's like expecting a bee keeper to know each bees name and bedtime. Really it's like asking a bricklayer where each brick came from in the pile, He can tell you the batch but not going to know this brick came from the forth row of the sixth pallet, two from the left. There is no reason to remember that it's not important to anyone.

The don't log it because it would take huge amounts of resources and gain nothing.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

zaphod , 3 months ago (edited 3 months ago)

What?

Compiling quality datasets is enormously challenging and labour intensive. OpenAI absolutely knows the provenance of the data they train on as it's part of their secret sauce. And there's no damn way their CTO won't have a broad strokes understanding of the origins of those datasets.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Guntrigger , 3 months ago

[Citation needed]

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

PoliticallyIncorrect , 3 months ago

Watching a video or reading an article by a human isn't copyright infringement, why then if an "AI" do it then it is? I believe the copyright infringement it's made by the prompt so by the user not the tool.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

uninvitedguest , 3 months ago

When a school professor "prompts" you to write an essay and you, the "tool" go consume copyrighted material and plagiarize it in the production of your essay is the infringement made by the professor?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

PoliticallyIncorrect , 3 months ago

If you quote the sources and write it with your own words I believe it isn't, AFAIK "AI" already do that.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

uninvitedguest , 3 months ago

It definitely does not cite sources and use it's own words in all cases - especially in visual media generation.

And in the proposed scenario I did write the student plagiarizes the copyrighted material.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

PoliticallyIncorrect , 3 months ago (edited 3 months ago)

If you read a book or watch a movie and get inspired by it to create something new and different, it's plagiarism and copyright infringement?

If that were the case the majority of stuff nowadays it's plagiarism and copyright infringement, I mean generally people get inspired by someone or something.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

buffaloseven , 3 months ago

There’s a long history of this and you might find some helpful information in looking at “transformative use” of copyrighted materials. Google Books is a famous case where the technology company won the lawsuit.

The real problem is that LLMs constantly spit out copyrighted material verbatim. That’s not transformative. And it’s a near-impossible problem to solve while maintaining the utility. Because these things aren’t actually AI, they’re just monstrous statistical correlation databases generated from an enormous data set.

Much of the utility from them will become targeted applications where the training comes from public/owned datasets. I don’t think the copyright case is going to end well for these companies…or at least they’re going to have to gradually chisel away parts of their training data, which will have an outsized impact as more and more AI generated material finds its way into the training data sets.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

stephen01king , 3 months ago

How constantly does it spit out copyrighted material? Is there data on that?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

buffaloseven , 3 months ago

There's more and more research starting to happen on it, but I've seen anywhere from 20% to 60% of responses. Here's a recent study where they explicitly try to coerce LLMs to break copyright: https://www.patronus.ai/blog/introducing-copyright-catcher

I don't have the time to grab them right now, but in many of the lawsuits brought forward against companies developing LLMs, their openings contain some statistics gathered on how frequently they infringed by returning copyrighted material.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

potustheplant , 3 months ago

You do realize that AI is just a marketing term, right? None of these models learn, have intelligence or create truly original work. As a matter of fact, if people don't continue to create original content, these models would stagnate or enter a feedback loop that would poison themselves with their own erroneous responses.

AIs don't think. They copy with extra steps.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

PoliticallyIncorrect , 3 months ago (edited 3 months ago)

I know AI it's just a marketing term I usually use quotes when I write the AI term, but anyway it isn't what real human intelillence does too?, you don't create things from nowhere, usually people use different sources to accomplish a conclusion, I believe it's exactly what "AI" does, just it speed up the process, instead of spending 30 minutes reading information about a random stuff, you just ask to the "AI" and it does it in 20 seconds, if you need instant answer to something I think it is pretty useful.

I know it doesn't think by itself but it speed up the process of searching objective stuff on the internet.

For example for psychological research it will suck of course but to speed up searching for polls made to the population it could be pretty useful.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

potustheplant , 3 months ago

Except that the information it gives you is often objectively incorrect and it makes up sources (this happened to me a lot of times). And no, it can't do what a human can. It doesn't interpret the information it gets and it can't reach new conclusions based on what it "knows".

I honestly don't know how you can even begin to compare an LLM to the human brain.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Tja , 3 months ago

So your question is "is plagiarism plagiarism"?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

uninvitedguest , 3 months ago

No, that is not the question nor a reasonable interpretation of it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

ominouslemon , 3 months ago

Copilot lists its sources. The problem is half of them are completely made up and if you click on the links they take you to the wrong pages

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Drewelite , 3 months ago

This is what people fundamentally don't understand about intelligence, artificial or otherwise. People feel like their intelligence is 100% "theirs". While I certainly would advocate that a person owns their intelligence, It didn't spawn from nothing.

You're standing on the shoulders of everyone that came before you. You take a prehistoric man or an alien that hasn't had any of the same experiences you've had, they won't be able to function in this world. It's not because they are any dumber than you. It's because you absorbed the hive mind of the society you live in. Everyone's racing to slap their brand on stuff to copyright it to get ahead and carve out their space.

"No you can't tell that story, It's mine."
"That art is so derivative."

But copyright was only meant to protect something for a short period in order to monetize it; to adapt the value of knowledge for our capital market. Our world can't grow if all knowledge is owned forever and isn't able to be used when even THINKING about new ideas.

ANY VERSION OF INTELLIGENCE YOU WOULD WANT TO INTERACT WITH MUST CONSUME OUR KNOWLEDGE AND PRODUCE TRANSFORMATIONS OF IT.

That's all you do.

Imagine how useless someone would be who'd never interacted with anything copyrighted, patented, or trademarked.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

raspberriesareyummy , 3 months ago

That's not a very agreeable take. Just get rid of patents and copyrights altogether and your point dissolves itself into nothing. The core difference being derivative works by humans can respect the right to privacy of original creators.

Deep learning bullshit software however will just regurgitate creator's contents, sometimes unrecognizable, but sometimes outright steal their likeness or individual style to create content that may be associated with the original creators.

what you are in effect doing, is likening learning from the ideas of others to a deep learning "AI" using images for creating revenge porn, to give a drastic example.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Drewelite , 3 months ago (edited 3 months ago)

Yes. Your last sentence is my point exactly. LLMs haven't replicated everything about the human brain. But the hype is here because it cracks one of our brains key features: How it learns. Your brain isn't magic. It just records training data until it has enough to mash it together into different things.

A child doesn't respect copyright, they'll draw a picture of Mario. You probably would too If I asked you to. Respecting copyright is something we learn to do in specific situations. This is called "coming up with an original idea". But that's bullshit. There are no original ideas.

If you come up with a product that's a cold brew cup that refrigerates its contents, I'd say that's a very original idea. But you didn't come up with refrigeration, you didn't come up with cups, or cold brew, or the idea of putting technology in a cup, or the concept of a product you sell to people. Name one thing about this idea that you didn't learn somewhere else? You can't. Because that's not how people work. A very real part of business, that you will learn as you put your new cup to market, is skirting around copyright. Somebody out there with a heated cup might come after you for example.

This is a difficult thing to learn the precise line on. Mostly because it can't work as a concrete rule. AI still has to be used, tested, and developed to learn the nuances here. And it will. But what baffles me is how my example above outlines how every process of invention has worked since the beginning of humanity. But if an LLM does it, people say, "That's not a real idea. It just took a bunch of stuff it's learned and mashed it together." But I hear, "My brain is 🪄magic✨ I'm special."

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

rottingleaf , 3 months ago

Yes, so how come all these arguments were not popular before the current hype about text generators?

Have some integrity.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dezmd , 3 months ago

They absolutely were, the entire time. You just didn't have interest in hearing about it aned weren't engaged on it.

Learn what integrity means if you want to use it as a snarky one liner.

Have some common sense.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

rottingleaf , 3 months ago

They absolutely were, the entire time. You just didn’t have interest in hearing about it aned weren’t engaged on it.

Why express your opinion on subjects where it's not worth anything?

You are saying these mutated cryptobros cared about copyright and patent laws being obsolete and harmful before "AI"?

Learn what integrity means if you want to use it as a snarky one liner.

I know what every word I use means

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

topinambour_rex , 3 months ago

What does this human is going to do with this reading ? Are they going to produce something by using part of this book or this article ?

If yes, that's copyright infringement.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

echo64 , 3 months ago

If you read an article, then copy parts of that article into a new article, that's copyright infringement. Same with ais.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

anlumo , 3 months ago

Depends on how much is copied, if it’s a small amount it’s fair use.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

echo64 , 3 months ago

Fair use depends on a lot, and just being a small amount doesn't factor in. It's the actual use. Small amounts just often fly under the nose of legal teams.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

FireTower , 3 months ago

Fair use is a four factor test amount used is a factor but a low amount being used doesn't strictly mean something is fair use. You could use a single frame of a movie and have it not qualify as fair use.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Prandom_returns , 3 months ago

Because it's software.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Drewelite , 3 months ago

How do you expect people will create AI if it can't do the things we do, when "doing the things we do" is the whole point?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Prandom_returns , 3 months ago

I never want software to impersonate a human.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Fisk400 , 3 months ago

They know what they fed the thing. Not backing up their own training data would be insane. They are not insane, just thieves

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

echodot , 3 months ago

Everyone says this but the truth is copyright law has been unfit for purpose for well over 30 years now. And the lords were written no one expected something like the internet to ever come along and they certainly didn't expect something like AI. We can't just keep applying the same old copyright laws to new situations when they already don't work.

I'm sure they did illegally obtain the work but is that necessarily a bad thing? For example they're not actually making that content available to anyone so if I pirate a movie and then only I watch it, I don't think anyone would really think I should be arrested for that, so why is it unacceptable for them but fine for me?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

oKtosiTe , 3 months ago

if I pirate a movie and then only I watch it, I don't think anyone would really think I should be arrested for that

There are definitely people out there that think you should be arrested for that.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

echodot , 3 months ago

Even the police are unsure if it's actually a crime though. Crimes require someone to lose something and no one can point to a lost product so it's difficult to really quantify.

And it's not even technically breach of copyright since you're not selling it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

exanime , 3 months ago

But they ARE selling it ... Every answer Chat GPT makes came from possibly stolen material

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

HaywardT , 3 months ago

Isn't that true of every opinion you have. All the knowledge you have is based on works of others that came before you.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

exanime , 3 months ago

Not untill I bill you for it

Also, no there is such a thing as an original thought or opinion... Even if it's informed on other knowledge

There is a difference between reinterpreting other knowledge and just Frankensteining multiple work together

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

HaywardT , 3 months ago

I don't know enough about LLMs but Neural networks are capable of original thought. I suspect LLMs are too because of their relationship to Neural Networks.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

confusedbytheBasics , 3 months ago

You're using the word 'stolen' which doesn't fit. It would be accurate to say 'every answer comes from possibly unlicensed material '.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

exanime , 3 months ago

Yeap, the real term (I think) would be copyright infringement

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Guntrigger , 3 months ago

Allegedly possibly maybe accidentally whoopsie not quite licensed fully material.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

rottingleaf , 3 months ago

That is a bad thing if they want to be exempt from the law because they are doing a big, very important thing, and we shouldn't.

The copyright laws are shit, but applying them selectively is orders of magnitude worse.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GiveMemes , 3 months ago

Ok but training an ai is not equivalent to watching a movie. It's more like putting a game on one of those 300 games in one DS cartridges back in the day.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Gabu , 3 months ago

The problem with that being?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GiveMemes , 3 months ago

Obviously, it's illegal to sell a product that's using copyrighted material you don't have the copyright to. This AI is not open source, it's a for profit system.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

It doesn't, though. You could have easily checked yourself, but I guess I'll do your research for you.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GiveMemes , 3 months ago

It does though. You could have easily checked for yourself, but I guess I'll do your research for you.

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

That article doesn't even claim it's distributing copyrighted material.

If that qualifies as distributing stolen copyrighted material, then this is stealing and distributing the "you shall not pass" LoTR scene. Which, again, ChatGPT won't even do

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GiveMemes , 3 months ago

Sorry, I know reading the whole article is hard:

The complaint cites several examples when a chatbot provided users with near-verbatim excerpts from Times articles that would otherwise require a paid subscription to view.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

Yeah lmao after like 20 paragraphs of nothing, it wasn't hard to believe you didn't know what you were talking about. But I looked at the complaint itself out of curiosity, and it's flimsy and misleading.

The first issue is 100% of the allegedly paywalled text from all 4 articles mentioned in the complaint can be read by non-paying customers for free outside of the paywall. You can't read the whole article, but you can get far enough to read all 4 quotes mentioned in the complaint yourself. The links to each article are in the complaint if you don't believe me. They have nothing to show they bypassed a paywall or that it was trained on unlicensed content.

The second issue is the third exhibit claims it will bypass paywalls when asked. This is demonstrably false because for one, the article they asked it for isn't paywalled, and for two, using their exact prompts word for word doesn't work if you try it yourself.

Two of the four exhibits don't even have screenshots, so there's no evidence it happened in the first place, but more importantly they don't (and apparently won't when asked) disclose what lengths they had to go to in order to get that output. For all we know they gave it 90% of the words and told it to fill in the gaps.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

HaywardT , 3 months ago

I don't think that is true. You aren't reselling the movies. It is more like watching the movies then writing a recap or critique of the movies. Do you owe the copyright holder for doing that?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

exanime , 3 months ago

Because the actual comparison is that you stole ALL movies, started your own Netflix with them and are lining up to literally make billions by taking the jobs of millions of people, including those you stole from

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

HaywardT , 3 months ago

I would say it is closer to watching all the movies, regardless of how you got them, then taught a film class at UCLA.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

If I paint a melty clock hanging off of a table, how have I stolen from Salvador Dali? What did I "steal" from Tolkien when I drew this?

you stole ALL movies, started your own Netflix with them

The model in question can't even try to distribute copyrighted material. You could have easily checked for yourself, but once again I find myself having to do the footwork for you guys.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

exanime , 3 months ago

If you sell your melty clock yes, it not "stealing" but you are violating copyright, that's how it works

The "model in question" is a bit of a prototype, I thought is was clear we are talking about where these models are going.... Maybe you'd get it if you came down of your high horse

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

Dali doesn't own the concept of a melting clock. If I include a melting clock in my own work, as long as it's not his melting clock with all the other elements of his painting, it's fair use.

GPT hasn't been a prototype since before 2018, and the copyright restrictions are only getting tighter every time it's updated so idk what you're on about.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

if I pirate a movie and then only I watch it, I don't think anyone would really think I should be arrested for that, so why is it unacceptable for them but fine for me?

Because it's more analogous to watching a video being broadcasted outdoors in the public, or looking at a mural someone painted on a wall, and letting it inform your creative works going forward. Not even recording it, just looking at it.

As far as we know, they never pirated anything. What we do know is it was trained on data that literally anybody can go out and look at yourself and have it inform your own work. If they're out here torrenting a bunch of movies they don't own or aren't licencing, then the argument against them has merit. But until then, I think all of this is a bunch of AI hysteria over some shit humans have been doing since the first human created a thing.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

StarPupil , 3 months ago

An AI (in its current form) isn't a person drawing inspiration from the world around it, it's a program made by people with inputs chosen by those people. If those people didn't ask permission to use other people's licensed work for their product, then they are plagiarising that work, and they should be subject to the same penalties that, for example, a game company using stolen art in their game should face. An AI doesn't become inspired, it copies existing things to predict what it thinks its user wants to see. If we produce a real thinking AI at some point in the future, one with self determination and whatnot, the story will be different, but for now it isn't.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

What is web scraping if not gathering information from around the world? As long as you're not distributing copyrighted content (and the models in question here don't, btw), then fair use is at play. I'm not plagiarizing the news by reading it or by talking about what I learned, but I would be if I just copy/pasted my response from the article.

Reading publicly available data isn't a copyright violation, and it certainly isn't a violation of fair use. If it were, then you just plagiarized my comment by reading it before you responded.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

VirtualOdour , 3 months ago

That's really not how it works though, it's a web crawler they're not going to download the whole internet

And a reason they don't is it would actually potentially be copywrite infringement in some cases where as what they do legally isn't (no matter how much people wish the law was set based on their emotions)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

autotldr Bot , 3 months ago

This is the best summary I could come up with:

Mira Murati, OpenAI's longtime chief technology officer, sat down with The Wall Street Journal's Joanna Stern this week to discuss Sora, the company's forthcoming video-generating AI.

It's a bad look all around for OpenAI, which has drawn wide controversy — not to mention multiple copyright lawsuits, including one from The New York Times — for its data-scraping practices.

After the interview, Murati reportedly confirmed to the WSJ that Shutterstock videos were indeed included in Sora's training set.

But when you consider the vastness of video content across the web, any clips available to OpenAI through Shutterstock are likely only a small drop in the Sora training data pond.

Others, meanwhile, jumped to Murati's defense, arguing that if you've ever published anything to the internet, you should be perfectly fine with AI companies gobbling it up.

Whether Murati was keeping things close to the vest to avoid more copyright litigation or simply just didn't know the answer, people have good reason to wonder where AI data — be it "publicly available and licensed" or not — is coming from.

The original article contains 667 words, the summary contains 178 words. Saved 73%. I'm a bot and I'm open source!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

Funny how we have all this pissing and moaning about stealing, yet nobody ever complains about this bot actually lifting entire articles and spitting them back out without ads or fluff. I guess it's different when you find it useful, huh?

I like the bot, but I mean y'all wanna talk about copyright violations? The argument against this bot is a hell of a lot more solid than just using data for training.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Guntrigger , 3 months ago

Is this bot a closed system which is being used for profit? No, you know exactly what its source is (the single article it is condensing) and even has a handy link about how it is open source at the end of every single post.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

It copied all of its text from the article, and it allows me to get all the information from it I want without providing that publisher with traffic or ad revenue. That's not fair use.

I do like the bot, and personally I'd rather it stay, but no matter how you look at it this isn't "fair use" of the article.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Guntrigger , 3 months ago

Interesting take. In all of the defences of LLMs using copyrighted material it's very often highlighted that "fair use" allows exactly such summaries of larger texts.

In reality, "fair use" is ruled on a case by case basis, so it's impossible to judge whether something is or not without it going to court.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

A_Very_Big_Fan , 3 months ago

We're not making legislation here, so we don't have that level of burden of proof. But either way, when it comes to factors of fair use that every authority on the matter will list, it violates almost all of them.

It's non-commercial, and it's using facts rather than using a more creative work, so it's got that going for it... But it's

composed of 100% copied material

it's not transformative

it's substituting the original work

it uses officially published work

it specifically copies the "heart" of the work

it bypasses all of the ads and impacts their traffic/metrics so it has a financial impact on them.

It's pretty obvious that there is no argument here. The factors that are violated the hardest and most undisputably are the ones that most authorities on the matter (including the one I linked) agree are the most important.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...