LLMs.txt: The Complete 2026 Guide for AI Search Optimization

Q: What if I have a multi-language site?

The spec doesn’t define multi-language behavior. Common patterns: one master llms.txt with cross-language sections, or per-locale subpath files (which the spec doesn’t require but doesn’t forbid either).

Updated May 20269 min readAI Search

Master /llms.txt — the file telling ChatGPT, Claude, Perplexity, and Gemini which of your pages matter most. Spec, structure, generator, and common mistakes.

llms.txt is a curated markdown index at your site root that tells AI agents which pages matter most when answering questions about your site.

What is llms.txt?

llms.txt is a markdown file at the root of your domain (example.com/llms.txt) that tells AI agents — ChatGPT, Claude, Perplexity, Gemini — which of your pages are worth loading when answering a user query about your site. It was proposed by Jeremy Howard at Answer.AI in September 2024. By mid-2026 it's been adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face, and thousands of smaller sites.

Think of it as the AI-era counterpart to robots.txt — but with an inverted purpose. Robots.txt is gatekeeping: "don't crawl here." LLMs.txt is curation: "here's what matters most."

The full spec, in one example

# Acme Documentation

> Acme is the developer platform for building real-time collaborative apps. This file points AI agents at our most useful documentation.

## Getting Started

- [Quickstart](https://acme.dev/docs/quickstart): Five-minute setup from npm install to first request
- [Authentication](https://acme.dev/docs/auth): API keys, OAuth, session tokens
- [Examples](https://acme.dev/docs/examples): Working code samples in 8 languages

## API Reference

- [REST API](https://acme.dev/api/rest): Full endpoint catalog
- [WebSocket API](https://acme.dev/api/ws): Real-time messaging spec
- [Webhooks](https://acme.dev/api/webhooks): Event delivery

## Guides

- [Deployment patterns](https://acme.dev/guides/deploy)
- [Scaling to 100k users](https://acme.dev/guides/scale)
- [Security checklist](https://acme.dev/guides/security)

## Optional

- [Changelog](https://acme.dev/changelog): Skippable for tight context budgets
- [Press coverage](https://acme.dev/press)

What's required, what's recommended

Required: An H1 with your site or project name. This is the only required element of the entire spec.
Recommended: A blockquote summary (> ) on the line after the H1 giving the AI one sentence of project framing.
Recommended: One or more H2 sections containing markdown link lists. Without these, the file is just a title.
Optional special section: An H2 literally named ## Optional — content AI agents can skip when context is tight.
File location: HTTPS root at /llms.txt only. Subpaths like /docs/llms.txt are not part of the spec.
Content-Type: Serve as text/markdown or text/plain. Never text/html.
Size: <50KB ideal, <150KB OK, ≥500KB defeats the curation purpose — use /llms-full.txt for the bulk content instead.

llms.txt vs llms-full.txt

The spec mentions a sibling file /llms-full.txt. It's used differently:

llms.txt = curated index. Markdown table of contents pointing at your most valuable URLs. Stays under 50KB.
llms-full.txt = full content dump. Inlines the actual text of every page in one large markdown file. Can be megabytes.

Most large adopters (Anthropic, Mintlify) ship both. AI agents pick whichever fits their context window. Our checker treats /llms-full.txt presence as a positive signal.

llms.txt vs robots.txt

	robots.txt	llms.txt
Purpose	Access control — "don't go here"	Content curation — "here's what matters"
Audience	Search-engine crawlers	AI agents at inference time
Format	Plain-text directives	Markdown
When read	Before every crawl	When the AI answers a query about your site
Required by	RFC 9309	llmstxt.org (community spec)

They're complementary, not alternatives. Publish both — and make sure your robots.txt allows the AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) that read your llms.txt.

Common mistakes

Missing H1. The spec's only required element. Without it, AI agents may skip the file entirely.
Server returns HTML. A SPA route or custom 404 catches /llms.txt requests and returns HTML. AI agents expect markdown — they bail.
Bare URLs. - https://example.com/docs works but wastes the description slot. Use - [Docs](https://example.com/docs): Getting started.
HTML anchors. Copy-pasting <a href> from a webpage. Markdown only.
"Click here" descriptions. Tells the AI nothing. Use descriptive link text.
Auth-walled links. Pointing AI at a 401/login page is broken — anonymous AI agents can't authenticate.
Robots-blocked links. Linking to a URL that your own robots.txt disallows. Pick one rule or the other.
Blocking AI bots in robots.txt while publishing llms.txt. Contradictory — the AI agents the llms.txt is for can't read it.
File too large. Over 500KB defeats the curation purpose. Move bulk to /llms-full.txt.
Stale content. Treat llms.txt like a sitemap — refresh quarterly or when navigation changes.

How AI agents actually use it

When you ask ChatGPT "what does Acme's pricing look like?" the agent typically:

Checks https://acme.dev/llms.txt
If found, parses the H2 sections and link list
Fetches a few high-relevance URLs (e.g., pricing-related)
Answers using the curated content

Without an llms.txt, the AI falls back to whatever Bing or Google has indexed — which is noisy, often outdated, and missing the framing you wanted to surface. The opportunity cost is real: AI search is becoming the front door, and llms.txt is the welcome mat.

Generating yours

Hand-curating an llms.txt for a 500-page docs site is brutal. The InstaRank SEO /llms-txt-checker tool includes a generator: it crawls your site, picks the top ~50 pages, and produces a spec-conformant llms.txt with curated H2 sections, blockquote summary, and proper markdown links. Edit before publishing.

The built-in generator creates a spec-conformant llms.txt you can review and edit before publishing.

Validation checklist

Before you ship, confirm:

File at https://yourdomain.com/llms.txt (HTTPS root)
HTTP 200 response
Content-Type: text/markdown or text/plain
Starts with # Site Name (H1)
Has a > summary blockquote on the next non-empty line
Has at least one ## Section with markdown links
All linked URLs return 200 (no broken links)
No duplicate URLs
No auth-walled or paywalled URLs
File size under 50KB (or shipped alongside llms-full.txt)
Your robots.txt allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Optional: <link rel="llms" href="/llms.txt"> in your homepage HTML head

Frequently asked questions

Should I block AI bots OR publish llms.txt?

Pick one. Publishing llms.txt while blocking GPTBot / ClaudeBot in robots.txt is self-defeating — the AI agents the file is meant for can’t read it.

How often should I update it?

Treat it like a sitemap. Update when major content launches, navigation changes, or quarterly at minimum so AI agents stay aligned with your current site structure.

What if I have a multi-language site?

Spec doesn’t define multi-language behavior. Common patterns: one master llms.txt with cross-language sections, or per-locale subpath files (which the spec doesn’t require but doesn’t forbid either).

Does it work for marketing sites or only docs?

Both. Documentation sites get the most obvious value (AI agents directly answer dev questions), but any site that wants to control how AI describes its product benefits — pricing, features, case studies, comparisons.

What about ai.txt or robots-ai.txt — should I use those instead?

ai.txt is a separate proposal focused on AI training opt-out (different problem). llms.txt is the one with real adoption for inference-time content guidance. They can coexist — they solve different problems.

LLMs.txt: The Complete 2026 Guide for AI Search Optimization

Updated May 20269 min readAI Search

Master /llms.txt — the file telling ChatGPT, Claude, Perplexity, and Gemini which of your pages matter most. Spec, structure, generator, and common mistakes.

llms.txt is a curated markdown index at your site root that tells AI agents which pages matter most when answering questions about your site.

What is llms.txt?

Think of it as the AI-era counterpart to robots.txt — but with an inverted purpose. Robots.txt is gatekeeping: "don't crawl here." LLMs.txt is curation: "here's what matters most."

The full spec, in one example

# Acme Documentation

> Acme is the developer platform for building real-time collaborative apps. This file points AI agents at our most useful documentation.

## Getting Started

- [Quickstart](https://acme.dev/docs/quickstart): Five-minute setup from npm install to first request
- [Authentication](https://acme.dev/docs/auth): API keys, OAuth, session tokens
- [Examples](https://acme.dev/docs/examples): Working code samples in 8 languages

## API Reference

- [REST API](https://acme.dev/api/rest): Full endpoint catalog
- [WebSocket API](https://acme.dev/api/ws): Real-time messaging spec
- [Webhooks](https://acme.dev/api/webhooks): Event delivery

## Guides

- [Deployment patterns](https://acme.dev/guides/deploy)
- [Scaling to 100k users](https://acme.dev/guides/scale)
- [Security checklist](https://acme.dev/guides/security)

## Optional

- [Changelog](https://acme.dev/changelog): Skippable for tight context budgets
- [Press coverage](https://acme.dev/press)

What's required, what's recommended

Required: An H1 with your site or project name. This is the only required element of the entire spec.
Recommended: A blockquote summary (> ) on the line after the H1 giving the AI one sentence of project framing.
Recommended: One or more H2 sections containing markdown link lists. Without these, the file is just a title.
Optional special section: An H2 literally named ## Optional — content AI agents can skip when context is tight.
File location: HTTPS root at /llms.txt only. Subpaths like /docs/llms.txt are not part of the spec.
Content-Type: Serve as text/markdown or text/plain. Never text/html.
Size: <50KB ideal, <150KB OK, ≥500KB defeats the curation purpose — use /llms-full.txt for the bulk content instead.

llms.txt vs llms-full.txt

The spec mentions a sibling file /llms-full.txt. It's used differently:

llms.txt = curated index. Markdown table of contents pointing at your most valuable URLs. Stays under 50KB.
llms-full.txt = full content dump. Inlines the actual text of every page in one large markdown file. Can be megabytes.

Most large adopters (Anthropic, Mintlify) ship both. AI agents pick whichever fits their context window. Our checker treats /llms-full.txt presence as a positive signal.

llms.txt vs robots.txt

	robots.txt	llms.txt
Purpose	Access control — "don't go here"	Content curation — "here's what matters"
Audience	Search-engine crawlers	AI agents at inference time
Format	Plain-text directives	Markdown
When read	Before every crawl	When the AI answers a query about your site
Required by	RFC 9309	llmstxt.org (community spec)

They're complementary, not alternatives. Publish both — and make sure your robots.txt allows the AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) that read your llms.txt.

Common mistakes

Missing H1. The spec's only required element. Without it, AI agents may skip the file entirely.
Server returns HTML. A SPA route or custom 404 catches /llms.txt requests and returns HTML. AI agents expect markdown — they bail.
Bare URLs. - https://example.com/docs works but wastes the description slot. Use - [Docs](https://example.com/docs): Getting started.
HTML anchors. Copy-pasting <a href> from a webpage. Markdown only.
"Click here" descriptions. Tells the AI nothing. Use descriptive link text.
Auth-walled links. Pointing AI at a 401/login page is broken — anonymous AI agents can't authenticate.
Robots-blocked links. Linking to a URL that your own robots.txt disallows. Pick one rule or the other.
Blocking AI bots in robots.txt while publishing llms.txt. Contradictory — the AI agents the llms.txt is for can't read it.
File too large. Over 500KB defeats the curation purpose. Move bulk to /llms-full.txt.
Stale content. Treat llms.txt like a sitemap — refresh quarterly or when navigation changes.

How AI agents actually use it

When you ask ChatGPT "what does Acme's pricing look like?" the agent typically:

Checks https://acme.dev/llms.txt
If found, parses the H2 sections and link list
Fetches a few high-relevance URLs (e.g., pricing-related)
Answers using the curated content

Generating yours

The built-in generator creates a spec-conformant llms.txt you can review and edit before publishing.

Validation checklist

Before you ship, confirm:

File at https://yourdomain.com/llms.txt (HTTPS root)
HTTP 200 response
Content-Type: text/markdown or text/plain
Starts with # Site Name (H1)
Has a > summary blockquote on the next non-empty line
Has at least one ## Section with markdown links
All linked URLs return 200 (no broken links)
No duplicate URLs
No auth-walled or paywalled URLs
File size under 50KB (or shipped alongside llms-full.txt)
Your robots.txt allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Optional: <link rel="llms" href="/llms.txt"> in your homepage HTML head

Frequently asked questions

Should I block AI bots OR publish llms.txt?

Pick one. Publishing llms.txt while blocking GPTBot / ClaudeBot in robots.txt is self-defeating — the AI agents the file is meant for can’t read it.

How often should I update it?

Treat it like a sitemap. Update when major content launches, navigation changes, or quarterly at minimum so AI agents stay aligned with your current site structure.

LLMs.txt: The Complete 2026 Guide for AI Search Optimization

What is llms.txt?

The full spec, in one example

What's required, what's recommended

llms.txt vs llms-full.txt

llms.txt vs robots.txt

Common mistakes

How AI agents actually use it

Generating yours

Validation checklist

Frequently asked questions

Should I block AI bots OR publish llms.txt?

How often should I update it?

What if I have a multi-language site?

Does it work for marketing sites or only docs?

What about ai.txt or robots-ai.txt — should I use those instead?

Related guides

Validate your llms.txt now

LLMs.txt: The Complete 2026 Guide for AI Search Optimization

What is llms.txt?

The full spec, in one example

What's required, what's recommended

llms.txt vs llms-full.txt

llms.txt vs robots.txt

Common mistakes

How AI agents actually use it

Generating yours

Validation checklist

Frequently asked questions

Should I block AI bots OR publish llms.txt?

How often should I update it?

What if I have a multi-language site?

Does it work for marketing sites or only docs?

What about ai.txt or robots-ai.txt — should I use those instead?

Related guides

Validate your llms.txt now