mirror of
https://github.com/ai-robots-txt/ai.robots.txt.git
synced 2025-10-05 15:42:47 +02:00
Simplify htaccess rewrite rule
https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_f fix #159 Signed-off-by: Sebastian Davids <sdavids@gmx.de>
This commit is contained in:
@@ -1,3 +1,3 @@
|
||||
RewriteEngine On
|
||||
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|aiHitBot|Amazonbot|Andibot|anthropic\-ai|Applebot|Applebot\-Extended|Awario|bedrockbot|Brightbot\ 1\.0|Bytespider|CCBot|ChatGPT\-User|Claude\-SearchBot|Claude\-User|Claude\-Web|ClaudeBot|cohere\-ai|cohere\-training\-data\-crawler|Cotoyogi|Crawlspace|Datenbank\ Crawler|Devin|Diffbot|DuckAssistBot|Echobot\ Bot|EchoboxBot|FacebookBot|facebookexternalhit|Factset_spyderbot|FirecrawlAgent|FriendlyCrawler|Gemini\-Deep\-Research|Google\-CloudVertexBot|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|meta\-externalagent|Meta\-ExternalAgent|meta\-externalfetcher|Meta\-ExternalFetcher|MistralAI\-User|MistralAI\-User/1\.0|MyCentralAIScraperBot|netEstate\ Imprint\ Crawler|NovaAct|OAI\-SearchBot|omgili|omgilibot|Operator|PanguBot|Panscient|panscient\.com|Perplexity\-User|PerplexityBot|PetalBot|PhindBot|Poseidon\ Research\ Crawler|QualifiedBot|QuillBot|quillbot\.com|SBIntuitionsBot|Scrapy|SemrushBot\-OCOB|SemrushBot\-SWA|Sidetrade\ indexer\ bot|SummalyBot|TikTokSpider|Thinkbot|Timpibot|VelenPublicWebCrawler|WARDBot|Webzio\-Extended|wpbot|YandexAdditional|YandexAdditionalBot|YouBot) [NC]
|
||||
RewriteRule !^/?robots\.txt$ - [F,L]
|
||||
RewriteRule !^/?robots\.txt$ - [F]
|
||||
|
@@ -169,7 +169,7 @@ def json_to_htaccess(robot_json):
|
||||
# User agents that contain any of the blocked values.
|
||||
htaccess = "RewriteEngine On\n"
|
||||
htaccess += f"RewriteCond %{{HTTP_USER_AGENT}} {list_to_pcre(robot_json.keys())} [NC]\n"
|
||||
htaccess += "RewriteRule !^/?robots\\.txt$ - [F,L]\n"
|
||||
htaccess += "RewriteRule !^/?robots\\.txt$ - [F]\n"
|
||||
return htaccess
|
||||
|
||||
def json_to_nginx(robot_json):
|
||||
|
@@ -1,3 +1,3 @@
|
||||
RewriteEngine On
|
||||
RewriteCond %{HTTP_USER_AGENT} (AI2Bot|Ai2Bot\-Dolma|Amazonbot|anthropic\-ai|Applebot|Applebot\-Extended|Bytespider|CCBot|ChatGPT\-User|Claude\-Web|ClaudeBot|cohere\-ai|Diffbot|FacebookBot|facebookexternalhit|FriendlyCrawler|Google\-Extended|GoogleOther|GoogleOther\-Image|GoogleOther\-Video|GPTBot|iaskspider/2\.0|ICC\-Crawler|ImagesiftBot|img2dataset|ISSCyberRiskCrawler|Kangaroo\ Bot|Meta\-ExternalAgent|Meta\-ExternalFetcher|OAI\-SearchBot|omgili|omgilibot|Perplexity\-User|PerplexityBot|PetalBot|Scrapy|Sidetrade\ indexer\ bot|Timpibot|VelenPublicWebCrawler|Webzio\-Extended|YouBot|crawler\.with\.dots|star\*\*\*crawler|Is\ this\ a\ crawler\?|a\[mazing\]\{42\}\(robot\)|2\^32\$|curl\|sudo\ bash) [NC]
|
||||
RewriteRule !^/?robots\.txt$ - [F,L]
|
||||
RewriteRule !^/?robots\.txt$ - [F]
|
||||
|
Reference in New Issue
Block a user