Robots.txt Generator

Build a robots.txt file to control how search engines and AI bots crawl your website.

Configuration

Sitemap URL

Generated robots.txt

Add user-agent blocks and rules, or select a preset above.

How to Use the Robots.txt Generator

Select a preset to quickly start with a common configuration. "Allow All" permits all crawlers, "Block All" disallows everything, "Block AI Bots" blocks popular AI training crawlers, and "Standard" allows all crawlers while blocking admin and API directories.
Customize user-agent blocks by selecting a bot from the dropdown or typing a custom user-agent name. Add Allow or Disallow rules with the path input for each block. You can add multiple rules per user-agent.
Add your sitemap URL in the sitemap field to help search engines discover your content efficiently.
Copy or download the generated robots.txt file and upload it to the root directory of your website so it is accessible at yourdomain.com/robots.txt.

Understanding Robots.txt

The robots.txt file is one of the oldest and most fundamental tools for controlling search engine behavior on your website. Following the Robots Exclusion Protocol established in 1994, this simple text file sits in your website's root directory and provides instructions to web crawlers about which areas of your site they may or may not access. While it operates on an honor system and cannot physically block access, all major search engines and legitimate crawlers respect robots.txt directives.

User-Agent Directives

Each block in a robots.txt file begins with a User-agent directive that specifies which crawler the rules apply to. The wildcard asterisk (*) matches all crawlers. Specific user-agents include Googlebot for Google, Bingbot for Bing, and newer entries like GPTBot for OpenAI's crawler. When a crawler visits your site, it reads robots.txt and follows the rules for its specific user-agent. If no specific rules exist, it falls back to the wildcard rules.

Allow and Disallow Rules

The Disallow directive tells crawlers not to access a specified path, while the Allow directive explicitly permits access to a path that might otherwise be blocked by a broader Disallow rule. Rules are matched from top to bottom, and the most specific matching rule takes precedence. For example, you might disallow an entire directory but allow access to a specific important file within it. An empty Disallow value means everything is allowed.

Blocking AI Training Crawlers

With the rise of large language models, many website owners want to prevent AI companies from using their content for training. Common AI crawlers include GPTBot and ChatGPT-User from OpenAI, CCBot from Common Crawl (used by many AI companies), Google-Extended from Google (for Gemini training), and Anthropic's crawlers. This generator includes a preset that blocks all major AI training bots while still allowing search engine indexing.

Crawl-Delay Directive

The Crawl-delay directive tells crawlers to wait a specified number of seconds between requests. This is useful for reducing server load on resource-constrained hosting. Note that Googlebot does not support the Crawl-delay directive; instead, you should configure crawl rate in Google Search Console. Bing, Yandex, and other crawlers do respect this directive. A value of 10 means the crawler will wait 10 seconds between requests. For a complete technical SEO setup, pair your robots.txt with our Meta Tag Generator and Canonical Checker to ensure search engines index your site correctly.

Common Mistakes to Avoid

Blocking CSS and JavaScript files prevents search engines from rendering your pages properly, which can hurt your rankings.
Using robots.txt for security is ineffective because the file is publicly readable, and blocked URLs may still appear in search results without content.
Forgetting the trailing slash on directory paths means the rule may not match as expected.
Not testing your robots.txt after deployment can lead to accidental blocking of important content. Use Google Search Console to test and validate your file.

Frequently Asked Questions

A robots.txt file is a text file placed in the root directory of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to access. It follows the Robots Exclusion Protocol and is the first file crawlers check before indexing your site.

Yes. AI companies like OpenAI, Anthropic, and Common Crawl use specific user-agent names for their crawlers. You can block GPTBot, ChatGPT-User, CCBot, Google-Extended, and other AI crawlers by adding Disallow rules for their user-agents in your robots.txt file. This tool includes a preset for blocking popular AI bots.

Not exactly. Robots.txt prevents crawlers from accessing your pages, but Google may still index the URL if other sites link to it. The listing will appear without a description. To fully prevent indexing, use the noindex meta tag or X-Robots-Tag HTTP header instead of or in addition to robots.txt.

The robots.txt file must be placed in the root directory of your domain, accessible at yourdomain.com/robots.txt. Each subdomain needs its own robots.txt file. The file must be served as text/plain content type and should be accessible via HTTP without authentication.

The Sitemap directive tells search engines where to find your XML sitemap, which lists all the pages you want indexed. This helps crawlers discover and index your content more efficiently. You can include multiple Sitemap directives. The sitemap URL should be the full absolute URL including the protocol.