this post was submitted on 08 Jun 2025
209 points (98.2% liked)

Fuck AI

3036 readers
969 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago
MODERATORS
 

It's impossible, i got this instance to just see lemmy from my own instance, but no, it was slow as hell the whole week, i got new pods, put postgres on a different pod, pictrs on another, etc.

But it was slow as hell. I didn't know what it was until a few hours before now. 500 GETs in a MINUTE by ClaudeBot and GPTBot, wth is this? why? I blocked the user agents, etc, using a blocking extension on NGINX and now it works.

WHY? So google can say that you should eat glass?

Life is now hell, if before at least someone could upload a website, now even that is painfull.

Sorry for the rant.

all 37 comments
sorted by: hot top controversial new old
[–] carrylex@lemmy.world 7 points 13 hours ago (2 children)

So I just had a look at your robots.txt:

User-Agent: *
  Disallow: /login
  Disallow: /login_reset
  Disallow: /settings
  Disallow: /create_community
  Disallow: /create_post
  Disallow: /create_private_message
  Disallow: /inbox
  Disallow: /setup
  Disallow: /admin
  Disallow: /password_change
  Disallow: /search/
  Disallow: /modlog
  Crawl-delay: 60

You explicitly allow searching your content by bots... That's likely one of the reasons why you get bot traffic.

[–] plz1@lemmy.world 1 points 7 hours ago

AI crawlers ignore robots.txt. The only way to get them to stop is with active counter measures.

[–] MonkderVierte@lemmy.zip 13 points 21 hours ago (2 children)

Patience, AI crash bubble burst will be soon.

[–] Zetta@mander.xyz 1 points 3 hours ago* (last edited 3 hours ago)

It won't crash soon, sorry Charlie. Maybe in like 2 - 5 years, but honestly I don't think there will ever be a "crash", just less ai buzzwords in everything

[–] AstralPath@lemmy.ca 7 points 20 hours ago
[–] flamingos@feddit.uk 90 points 1 day ago* (last edited 1 day ago) (2 children)

You can enable Private Instance in your admin settings, this will mean only logged in users can see content. This will prevent AI scrapers from slowing down your instance as all they'll see is an empty homepage, so no DB calls. As long as you're on 0.19.11, federation will still work.

[–] potatoguy@potato-guy.space 42 points 1 day ago

Enabled, thanks for the tip!

[–] melroy@kbin.melroy.org 12 points 1 day ago

Same for Mbin.

[–] Mwa@thelemmy.club 12 points 1 day ago* (last edited 1 day ago) (1 children)

You can either use Cloudflare(proprietary) or anubis (Foss)

[–] jagged_circle@feddit.nl -1 points 22 hours ago (1 children)
[–] Mwa@thelemmy.club 8 points 22 hours ago (1 children)
[–] jagged_circle@feddit.nl 4 points 18 hours ago (1 children)

Because it harms marginalized folks' ability to access content while also letting evil corp (and their fascist government) view (and modify) all encrypted communication with your site and its users.

It's bad.

[–] jerkface@lemmy.ca 1 points 14 hours ago (1 children)

For clarity, you are referring to Cloudflare and not anaubis?

[–] jagged_circle@feddit.nl 1 points 12 hours ago (1 children)

I am referring to cf, but I would expect anaubis would be the same if it provides DoS fronting

[–] monogram@feddit.nl 2 points 8 hours ago (1 children)

Anubis work in a very different way than cloudflare

[–] jagged_circle@feddit.nl 1 points 7 hours ago

How well does it work in tor browser in strict mode?

[–] xep@fedia.io 49 points 1 day ago (1 children)

At some point they're going to try to evade detection to continue scraping the web. The cat and mouse game continues except now the "pirates" are big tech.

[–] brandon@piefed.social 32 points 1 day ago* (last edited 1 day ago) (2 children)

They already do. ("They" meaning AI generally, I don't know about Claude or ChatGPT's bots specifically). There are a number of tools server admins can use to help deal with this.

See also:

[–] lurch@sh.itjust.works 12 points 1 day ago

these solutions have the side effect of making the bots stay on your site longer and generate more traffic. it's not for everyone.

[–] Black616Angel@discuss.tchncs.de 1 points 19 hours ago

https://zadzmo.org/ is dead already and arstechnica is writing about them so...

[–] termaxima@programming.dev 22 points 1 day ago (2 children)

Anubis + Nepenthes is the answer.

[–] Finch9678@europe.pub 13 points 1 day ago

Article for whoever was unaware like me.

[–] parpol@programming.dev 37 points 1 day ago (1 children)

Use Anubis. That's pretty much the only thing you can do against bots that they have no way of circumventing.

[–] potatoguy@potato-guy.space 15 points 1 day ago (1 children)

Yeah, going to install it this week, but the nginx extension seemed to solve the issue.

[–] melroy@kbin.melroy.org 4 points 1 day ago

Which extention are you using if I may ask?

[–] jagged_circle@feddit.nl 4 points 22 hours ago* (last edited 18 hours ago) (1 children)

Just cache. Read only traffic should add negligible load to your server. Or you're doing something horribly wrong

[–] potatoguy@potato-guy.space 5 points 21 hours ago (2 children)

They are 1 cpu and 1 gb of ram pods, postgres goes to 100% cpu on 500 requests per minute, after i put the NGINX extension, it reduced to at max 10%. On weaker servers, these bots make hell on earth, not the config.

[–] jerkface@lemmy.ca 5 points 21 hours ago (1 children)

If it's hitting postgres it's not hitting the cache. Do you have a caching reverse proxy in front of your web application?

[–] potatoguy@potato-guy.space 1 points 20 hours ago* (last edited 20 hours ago) (1 children)

I don't have a cache, but the problem is solved now, i can browse lemmy haha.

[–] jerkface@lemmy.ca 4 points 19 hours ago (1 children)

The nginx instance you have in front of your app can perform caching and avoid hitting your app. The advantage is that it will improve performance even against the most stealthy of bots, including those that don't even exist yet. The disadvantage is that the AI scum get what they want.

[–] potatoguy@potato-guy.space 2 points 19 hours ago (1 children)

Oh, cool. I'm going to look at it!

[–] jagged_circle@feddit.nl 2 points 18 hours ago

If that doesn't work for you, also look at varnish and squid.

[–] jagged_circle@feddit.nl 1 points 18 hours ago

Load should be near zero for reads.

[–] lena@gregtech.eu 9 points 1 day ago

Cloudflare has pretty good protection against this, but I totally understand not wanting to use Cloudflare

[–] melroy@kbin.melroy.org 7 points 1 day ago

Haha, just wait when you get ddosed by anonymous user agents. I have been there.

I'm talking 40k requests per 5 seconds.