Robots.txt
Robots.txt is a file that instructs crawlers, including AI bots, which parts of your site they can access.
Definition
Robots.txt is a standard file at your website's root that provides instructions to web crawlers about which pages they can access. For GEO, robots.txt configuration determines whether AI crawlers like GPTBot, ClaudeBot, and PerplexityBot can access your content for training and retrieval.
Why It Matters
Your robots.txt directly controls AI crawler access. Blocking AI crawlers means your content won't be included in AI training or real-time retrieval.
How to Test with TestMyGEO
TestMyGEO analyzes your robots.txt configuration and identifies any AI crawlers being blocked.
Best Practices
- Allow major AI crawlers by default
- Only block with specific reasons
- Keep robots.txt updated with new crawlers
- Test changes before deployment
- Monitor crawler access logs
Common Mistakes to Avoid
- Blanket blocking all bots
- Accidentally blocking AI crawlers
- Not updating for new AI crawlers
Frequently Asked Questions
Which AI crawlers should I allow?
For maximum AI visibility, allow GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Only block if you have specific privacy or competitive concerns.