Welcome to Incremental Social! Learn more about this project here!
Check out lemmyverse to find more communities to join from here!

coolkicks

@coolkicks@lemmy.world

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Is there a simple way to severly impede webscraping and LLM data collection of my website?

I am working on a simple static website that gives visitors basic information about myself and the work I do. I want this as a way use to introduce myself to potential clients, collaborators, etc., rather than rely solely on LinkedIn as my visiting card....

coolkicks ,

If LLMs were accurate, I could support this. But at this point there’s too much overtly incorrect information coming from LLMs.

“Letting AI scrape your website is the best way to amplify your personal brand, and you should avoid robots.txt or use agent filtering to effectively market yourself. -ExtremeDullard”

isn’t what you said, but is what an LLM will say you said.

coolkicks ,

I used to be in credit risk for a very large stock market company.

Calling the bottom of the market is the same as betting big and getting 21 in blackjack.

Super cool when it happens, but not skill. The number of grown men I had to hear crying because they were dollar cost averaging down to the bottom until they went broke still disturbs me.

I’m happy this worked for you, but it was not skill.

coolkicks ,

GitHub has 28 million public repos

Gitlab is has less than an order of magnitude as many Under a million in 2020, and nearly 80% without FOSS license.

Is it everyone’s favorite, or best, or most feature rich. Nah. Is it where the FOSS projects are. Yes.

coolkicks ,

Sure, self-hosting is a great option for very large projects, but a random python library to help with an analytics workflow isn’t going to self-host. Those projects, along with 27,999,990 others have chosen GitHub, often times explicitly to reduce the barrier to contribution.

Also, all of those examples are built on thousands of other FOSS projects, 99% of which aren’t self-hosting. This is the same as arguing only Amazon is a bookseller and ignoring the thousands of independent book publishers creating the books Amazon is selling.

Are there any genuine benefits to AI?

I can see some minor benefits - I use it for the odd bit of mundane writing and some of the image creation stuff is interesting,, and I knew that a lot of people use it for coding etc - but mostly it seems to be about making more cash for corporations and stuffing the internet with bots and fake content. Am I missing something...

coolkicks ,

Lots of boring applications that are beneficial in focused use cases.

Computer vision is great for optical character recognition, think scanning documents to digitize them, depositing checks from your phone, etc. Also some good computer vision use cases for scanning plants to see what they are, facial recognition for labeling the photos in your phone etc…

Also some decent opportunities in medical research with protein analysis for development of medicine, and (again) computer vision to detect cancerous cells, read X-rays and MRIs.

Today all the hype is about generative AI with content creation which is enabled with Transformer technology, but it’s basically just version 2 (or maybe more) of Recurrent Neural Networks, or RNNs. Back in 2015 I remember this essay, The Unreasonable Effectiveness of RNNs being just as novel and exciting as ChatGPT.

We’re still burdened with this comment from the first paragraph, though.

Within a few dozen minutes of training my first baby model (with rather arbitrarily-chosen hyperparameters) started to generate very nice looking descriptions of images that were on the edge of making sense.

This will likely be a very difficult chasm to cross, because there is a lot more to human knowledge than thinking of the next letter in a word or the next word in a sentence. We have knowledge domains where, as an individual we may be brilliant, and others where we may be ignorant. Generative AI is trying to become a genius in all areas at once, and finds itself borrowing “knowledge” from Shakespearean literature to answer questions about modern philosophy because the order of the words in the sentences is roughly similar given a noun it used 200 words ago.

Enter Tiny Language Models. Using the technology from large language models, but hyper focused to write children’s stories appears to have progress with specialization, and could allow generative AI to stay focused and stop sounding incoherent when the details matter.

This is relatively full circle in my opinion, RNNs were designed to solve one problem well, then they unexpectedly generalized well, and the hunt was on for the premier generalized model. That hunt advanced the technology by enormous amounts, and now that technology is being used in Tiny Models, which is again looking to solve specific use cases extraordinarily well.

Still very TBD to see what use cases can be identified that add value, but recent advancements to seem ripe to transition gen AI from a novelty to something truly game changing.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • incremental_games
  • meta
  • All magazines