Honestly, if you're running a site that actually wants to be discovered through AI recommendations, blanket-blocking all crawlers feels like shooting yourself in the foot. I've seen far too many projects where someone just flips on the default Cloudflare "block all bots" toggle without realising they're cutting off the very discovery channels that drive modern referral traffic.
You don't need to let every scraper in - that's a recipe for server strain and stolen content. But explicitly allowing the major AI crawlers that feed into search, summarisation, and recommendation products? That's just smart visibility strategy. The ones I'd never block: Google-Extended, Applebot-Extended, GPTBot, ClaudeBot. They're the ones actually shaping how people find content today.
What I'd recommend instead: write your own robots.txt, start permissive for those four, then monitor logs for any nasty scrapers and block them on a case-by-case basis. That way you keep the discovery pipeline open without leaving the door unlocked for every rubbish aggregator.