Sites scramble to block ChatGPT web crawler after instructions emerge

breaks , 10 months ago

But for large website operators, the choice to block large language model (LLM) crawlers isn't as easy as it may seem. Making some LLMs blind to certain website data will leave gaps of knowledge that could serve some sites very well (such as sites that don't want to lose visitors if ChatGPT supplies their information for them), but it may also hurt others. For example, blocking content from future AI models could decrease a site's or a brand's cultural footprint if AI chatbots become a primary user interface in the future. As a thought experiment, imagine an online business declaring that it didn't want its website indexed by Google in the year 2002—a self-defeating move when that was the most popular on-ramp for finding information online.

Really curious how this will end up

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

abhibeckert , 10 months ago

I'd bet sites blocking ChatGPT will regret it when (not if) Bing starts using it for search engine relevance.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

acastcandream , 10 months ago

That’s because you block the GPT crawler doesn’t mean you are no longer indexed

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

mp3 , 10 months ago
Lemmy.ca added a block at the nginx level for it

https://lemmy.ca/comment/1999439
# curl -H 'User-agent: GPTBot' https://lemmy.ca/ -i
HTTP/2 403
Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

thebardingreen , 10 months ago

Hilariously, unless ALL lemmy instances do this, anyone that federates with you will have to block it too or any communities they sync with you will be available on their instances...

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...