technicalbeginnerGEO CriticalGPTBotClaudeBotGoogle-ExtendedPerplexityBot

Robots.txt

Robots.txt is a file that instructs crawlers, including AI bots, which parts of your site they can access.

Definition

Robots.txt is a standard file at your website's root that provides instructions to web crawlers about which pages they can access. For GEO, robots.txt configuration determines whether AI crawlers like GPTBot, ClaudeBot, and PerplexityBot can access your content for training and retrieval.

Why It Matters

Your robots.txt directly controls AI crawler access. Blocking AI crawlers means your content won't be included in AI training or real-time retrieval.

How to Test with TestMyGEO

TestMyGEO analyzes your robots.txt configuration and identifies any AI crawlers being blocked.

Best Practices

Allow major AI crawlers by default
Only block with specific reasons
Keep robots.txt updated with new crawlers
Test changes before deployment
Monitor crawler access logs

Common Mistakes to Avoid

Blanket blocking all bots
Accidentally blocking AI crawlers
Not updating for new AI crawlers

Frequently Asked Questions

Which AI crawlers should I allow?

For maximum AI visibility, allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Only block if you have specific privacy or competitive concerns.

Back to Glossary

Test Your GEO Visibility

See how generative AI engines discover and cite your content.

Test My GEO