Activity - it's all but guaranteed. Reminds me of this Computerphile video:...

Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

NeatNit , 4 months ago

it's all but guaranteed. Reminds me of this Computerphile video: https://youtu.be/WO2X3oZEJOA?t=874 TL;DW: there were "glitch tokens" in GPT (and therefore ChatGPT) which undeniably came from Reddit usernames.

Note, there's no proof that these reddit usernames were in the training data (and there's even reasons to assume that they weren't, watch the video for context) but there's no doubt that OpenAI already had scraped reddit data at some point prior to training, probably mixed in with all the rest of their text data. I see no reason to assume they completely removed all reddit text before training. The video suggest reasons and evidence that they removed certain subreddits, not all of reddit.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Federation

Status:

On | Off

Thread

L4s

@L4s@lemmy.world

Added: 4 months ago
Views: 15
Ratio: 0

Magazine

Technology

@technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed