Glad this is becoming a meme

Liz , 3 months ago

Ask a man his salary. Do it. How else are you supposed to learn who is getting underpaid? The only way to rectify that problem is to learn about it in the first place.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

EmpathicVagrant , 3 months ago

The NLRB ensures that discussion of wages is a protected right.

Talk about your wages.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

brbposting , 3 months ago

45 can fix that

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

EmpathicVagrant , 3 months ago

Plans to, too.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

SpaceCowboy , 3 months ago

I think context is important here. Asking a co-worker their salary is fine. Asking about the salary of someone you're on a date with is not fine.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GlitterInfection , 3 months ago

Exactly.

You should have asked them for their W-2 before agreeing to meet.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

SpaceCowboy , 3 months ago

Yeah and get their credit score before you even reply.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

moistclump , 3 months ago

Ask a woman her age. Do it. How else are you supposed to learn who is getting older? The only way to celebrate that is to learn about it in the first place.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dislocate_expansion Bot , 3 months ago

Anyone know why most are a 2021 internet data cut off?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

potustheplant , 3 months ago

Where do you get that from? At least ChatGPT isn't limited to data from 2021. I haven't researched about other models.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RatBin , 3 months ago

Gpt 3.5 is limited to 2021. Gpt 4; 4.5; the imaginary upcoming gpt 5 models are not, but that does not mean they aren't limited in their own ways.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dislocate_expansion Bot , 3 months ago

Are you sure those aren't trained until 2021, frozen, and then fine tuned on later data?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

RatBin , 3 months ago

I really don't know, I'm speculating, but neither does openai know, that's sure. So we have the most popular ML system used by millions based on...what exactly?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dislocate_expansion Bot , 3 months ago

Yeah GPT 3.5 and some other FOSS models also say 2021

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

potustheplant , 3 months ago

OpenAI stated in a tweet a few months ago that the limitation is no longer in place.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

webghost0101 , 3 months ago

To be fair this tweet doesn't say anything about training data but simply that it theoretically can use present day data if it looks it up online.

For gpt4 i think its was initially trained up to 2021 but it has gotten updates where data up to december 2023 was used in training. It “knows” this data and does not need to look ut up.

Whether they managed to further train the initial gpt4 model to do so or added something they trained separately is probably a trade secret.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dislocate_expansion Bot , 3 months ago

Thanks!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Natanael , 3 months ago

Training from scratch and retraining is expensive. Also, they want to avoid training on ML outputs as samples, they want primarily human made works as samples, and after the initial public release of LLMs it has become harder to create large datasets without ML stuff in them

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

scrubbles , 3 months ago (edited 3 months ago)

There was a good paper that came out recently saying that training on ml data will result in a collapse of cohesion. It's going to be real interesting, I don't know if they'll be able to train as easily ever again

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

TurtleJoe , 3 months ago

I think it's telling that they acknowledge that the stuff their bots churn out is often such garbage that training their bots on it would ruin them.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Iron_Lynx , 3 months ago

I recall spotting a few things about Image Generators having their training data contaminated using generated images, and the output becoming significantly worse. So yeah, I guess LLMs and IGA's need natural sources, or it gets more inbred than the Habsburgs.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Donkter , 3 months ago

I think it's just that most are based on chatgpt which cuts off at 2021.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

can , 3 months ago

Hey, did you know your profile is set to appear as a bot and as a result many may be filtering your posts and comments? You can change this in your Lemmy settings.

Unless you are a bot... In which case where did you get your data?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dislocate_expansion Bot , 3 months ago

The data wasn't stolen, I can at least assure you of that

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

can , 3 months ago

You paid Hoffman?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hjalamanger , 3 months ago

I love of it isn't just a image of the open ai logo but also a sad person besides it

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

unique_hemp , 3 months ago

Oh that is not just some person, that's the CTO of "Open"AI when asked, if YT videos were used to train Sora.

Sauce: https://youtu.be/mAUpxN-EIgU?feature=shared&t=270

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

brbposting , 3 months ago

Lying MF, unbelievable that’s the best they thought of.

I’m sorry, but we’ve made an internal decision not to reveal our proprietary methodology at this time.

There, now it’s not a lie (hurr durr I’m only the CTO how would I know whether a tiny startup like YOUTUBE was one of our sources)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

turkishdelight , 3 months ago

What's wrong with her face?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

AnUnusualRelic , 3 months ago

Poor training data presumably.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

turkishdelight , 3 months ago

🤣

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

SpaceCowboy , 3 months ago

It's this face:
https://www.compdermcenter.com/wp-content/uploads/2016/09/vanheusen_5BSQnoz.jpg

She was asked about openai using copyrighted material for training data and literally made that face. Only thing more perfect would've been if she tugged at her collar while doing the face.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

zinderic , 3 months ago

It's almost impossible to audit what data got into an AI model. Until this is true companies could scrape and use whatever they like and no one would be the wiser to what data got used or misused in the process. That makes it hard to make such companies accountable to what and how they are using.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

po-lina-ergi , 3 months ago

Then it needs to be on companies to prove their audit trail, and until then require all development to be open source

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

zinderic , 3 months ago

That would be amazing. But it won't happen any time soon if ever.. I mean - just think about all that investment in GPU compute and the need to realize good profit margins. Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don't think much would happen. They'll just keep feeding any data they get their hands on and nothing can stop that today.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

InputZero , 3 months ago

Maybe not today and maybe not every AI but maybe some AI in the near future will have it's data sources made explainable. There are a lot of applications where deploying AI would be an improvement over what we have. One example I can bring up is in-silico toxicology experiments. There's been a huge desire to replace as many in-vivo experiments with in-vitro or even better in-silico to minimize the number of live animals tested on. Both for ethical reasons and cost savings. AI has been proposed as a new tool to accomplish this but it's not there yet. One of the biggest challenges to overcome is making the AI models used in-silico to be explainable, because we can not regulate effectively what we can not explain. Regardless there is a profits incentive for AI developers to make at least some AI explainable. It's just not where the big money is. To which end that will apply to all AI I haven't the slightest idea. I can't imagine OpenAI would do anything to expose their data.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

ipkpjersi , 3 months ago

Until there are laws and legislation that requires AI companies to open their data pipelines and make public all details about the data sources I don’t think much would happen.

I don't expect those laws to ever happen. They don't benefit large corporations so there's no reason those laws would ever be prioritized or considered by lawmakers, sadly.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

"Publicly available data" - I wonder if that includes Disney's catalogue? Or Nintendo's IP? I think they are veeery selective about their "Publicly available data", it also implies the only requirement for such training data is that it is publicly available, which almost every piece of media ever? How an AI model isn't public domain by default baffles me.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Raykin , 3 months ago

Great point.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Even_Adder , 3 months ago

You should check out this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

Great articles, first is one of the best I've read about the implications of fair use. I argue that because of the broadness of human knowledge that is interpreted through these models, everyone is entitled to have unrestricted access to them (not the servers or algorithms used, the models). I'll dub it "the library of the digital age" argument.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

programmer_belch , 3 months ago

The problem is that if copyrighted works are used, you could generate a copyrighted artwork that would be made into public domain, stripping its protection. I would love this approach, the problem is the lobbyists don't agree with me.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

Not necessarily, if a model is public domain, there could still be a lot of proprietary elements used in interpreting that model and actually running it. If you own the hardware and generate something using AI, I'd say the copyright goes to you. You use AI as the brush to paint your painting and the painting belongs to you, but if a company allows you to use their canvas and their painting tools, it should go to them.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

programmer_belch , 3 months ago

I think that if I paint with my own brush a mario artwork that isn't to Nintendo's standart, they have the legal power to take it down from wherever I upload

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

Really? Even if your artwork isn't used in a commercial way?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

programmer_belch , 3 months ago

I'm really not in the know abput these things but I have seen free fangames taken down because they used copyrighted property even though the creators don't receive a penny.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

I'll compare it with the recent takedown of the Switch emulator Yuzu. It's my understanding they actively solicited donations and piracy, both of which could be seen as commercial activities. Which in a project of that scale the latter was their downfall, meanwhile Ryujinx is still up and running. But we'll see if that remains true.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Grimy , 3 months ago

Copyrights and IP laws don't only come into effect if profit is made. Fan products are usually tolerated by companies because it's free advertising and fans get angry when it does get taken down.

When a fan product starts making money, it's usually because it directly competes with the original IP and then they act. Even then, Etsy has thousands of shops with copyrighted content but the small profit loss doesnt justify the loss of reputation for the companies.

That being said, it's the user who uploads it who is at fault and not the tool used to create it.

Ultimately, I think it's the platforms that let users upload copyrighted content and celebrity likenesses that should be at fault. Take for example the Taylor Swift debacle. An image generator was used to create the images sure but twitter chose to let it float on their website for a whole day even though it was most likely reported in the first 5 minutes.

There's also the fact that if we start demanding AI doesn't use copyrighted content, it kills the open source scene and cements Google and Microsoft's grip on our economy as we move towards an AI driven society.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

Oh yeah I was just showing an example! There is much more to it then just commercial, but it's a very quick way get the attention of businesses. Whether it be direct or indirect.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

programmer_belch , 3 months ago

I think that if someone uploads mario doing warcrimes to twitter and it gets viral, there is no "I made it with my own brush" that can save you from Nintendo taking the artwork down.

This example also works with a fanart of a celebrity in a sexual context without any AI use.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

programmer_belch , 3 months ago

In my opinion making AI stop taking copyrighted content can only be enforced by making all AI development open source, datasets and models included. This is the way to loosen the control bigmonopolies like Google and Microsoft have over it.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

the_artic_one , 3 months ago

Yes, fanart is almost certainly copyright infringement unless the copyright holder grants a license. Many companies have an official license for non-commercial fanart and generally nobody cares about it but if someone really wanted to they could absolutely file takedown requests against all fanart of their work.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Natanael , 3 months ago

The existing legal precedence in most places is that most use of ML doesn't count as human expression and doesn't have copyright protection. You have to have significant control over the creation of the output to have copyright (the easiest workaround is simply manually modifying the ML output and then only releasing the modified version)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

The existing legal precedence

I know that's how law works, but there is no precedent for AI at this scale and will only get worse. What if AI gains full sentience? Are they a legally recognised person? Do they have rights and do they not own the copyright themselves? All very good questions with no precedent in law.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Natanael , 3 months ago

The law says human creative expression

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

At what point does human creative expression become a sentient being?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

jaybone , 3 months ago

If you rent a brush to paint with, is the painting not yours? If you rent a musical instrument to record an original song with, is the song not yours?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

kernelle , 3 months ago

Exactly! When you pay for a service you own the copyright, like having a photoshop license. I meant in other situations where it's free or provided as research tools to engineers under a company.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

rotopenguin , 3 months ago

Read the fine print on that agreement

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

redcalcium , 3 months ago

There is a rumor that OpenAI downloaded the entirety of LibGen to train their AI models. No definite proof yet, but it seems very likely.

https://torrentfreak.com/authors-accuse-openai-of-using-pirate-sites-to-train-chatgpt-230630/

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

100_kg_90_de_belin , 3 months ago

"It just like me fr fr" (cit.)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...