Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

underisk , (edited )
@underisk@lemmy.ml avatar

The part you're missing is the metadata. AI (neural networks, specifically) are trained on the data as well as some sort of contextal metadata related to what they're being trained to do. For example, with reddit posts they would feed things like "this post is popular", "this post was controversial", "this post has many views", etc. in addition to the post text if they wanted an AI that could spit out posts that are likely to do well on reddit.

Quantity is a concern; you need to reach a threshold of data which is fairly large to have any hope of training an AI well, but there are diminishing returns after a certain point. The more data you feed it the more you have to potentially add metadata that can only be provided by humans. For instance with sentiment analysis you need a human being to sit down and identify various samples of text with different emotional responses, since computers can't really do that automatically.

Quality is less of a concern. Bad quality data, or data with poorly applied metadata will result in AI with less "accuracy". A few outliers and mistakes here and there won't be too impactful, though. Quality here could be defined by how well your training set of data represents the kind of input you'll be expecting it to work with.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • technology@lemmy.world
  • random
  • incremental_games
  • meta
  • All magazines