Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

IphtashuFitz ,

robots.txt is 100% honor based. Well known bots like Googlebot, Bingbot, etc. definitely honor them. But there are also plenty of bots that completely ignore them.

I would hope the bots used to collect LLM training data honors them, but there’s no way to know for certain. And all it really takes is one bot ignoring it for the content of your website to end up in a random set of training data…

  • All
  • Subscribed
  • Moderated
  • Favorites
  • privacy@lemmy.ml
  • random
  • incremental_games
  • meta
  • All magazines