Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

my_hat_stinks ,

They'll use old comments either way, using an up-to-date dataset means using a dataset already tainted by LLM-generated content. Training a model on its own output is not great.

Incidentally this also makes Lemmy data less valuable, most of Lemmy's popularity came after the rise of LLMs so there's no significant untainted data from before LLMs.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@lemmy.world
  • random
  • incremental_games
  • meta
  • All magazines