LLMs.txt: The Complete 2026 Guide for AI Search Optimization
Master /llms.txt — the markdown file that tells AI agents (ChatGPT, Claude, Perplexity, Gemini) which of your pages matter most. Spec, structure, generator, common mistakes, and how it differs from robots.txt.
Published 2026-05-26 · InstaRank SEO
What is llms.txt?
llms.txt is a markdown file at the root of your domain (example.com/llms.txt) that tells AI agents — ChatGPT, Claude, Perplexity, Gemini — which of your pages are worth loading when answering a user query about your site. It was proposed by Jeremy Howard at Answer.AI in September 2024. By mid-2026 it's been adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face, and thousands of smaller sites.
Think of it as the AI-era counterpart to robots.txt — but with an inverted purpose. Robots.txt is gatekeeping: "don't crawl here." LLMs.txt is curation: "here's what matters most."
The full spec, in one example
# Acme Documentation > Acme is the developer platform for building real-time collaborative apps. This file points AI agents at our most useful documentation. ## Getting Started - [Quickstart](https://acme.dev/docs/quickstart): Five-minute setup from npm install to first request - [Authentication](https://acme.dev/docs/auth): API keys, OAuth, session tokens - [Examples](https://acme.dev/docs/examples): Working code samples in 8 languages ## API Reference - [REST API](https://acme.dev/api/rest): Full endpoint catalog - [WebSocket API](https://acme.dev/api/ws): Real-time messaging spec - [Webhooks](https://acme.dev/api/webhooks): Event delivery ## Guides - [Deployment patterns](https://acme.dev/guides/deploy) - [Scaling to 100k users](https://acme.dev/guides/scale) - [Security checklist](https://acme.dev/guides/security) ## Optional - [Changelog](https://acme.dev/changelog): Skippable for tight context budgets - [Press coverage](https://acme.dev/press)
What's required, what's recommended
- Required: An
H1with your site or project name. This is the only required element of the entire spec. - Recommended: A blockquote summary (
>) on the line after the H1 giving the AI one sentence of project framing. - Recommended: One or more
H2sections containing markdown link lists. Without these, the file is just a title. - Optional special section: An H2 literally named
## Optional— content AI agents can skip when context is tight. - File location: HTTPS root at
/llms.txtonly. Subpaths like/docs/llms.txtare not part of the spec. - Content-Type: Serve as
text/markdownortext/plain. Nevertext/html. - Size: <50KB ideal, <150KB OK, ≥500KB defeats the curation purpose — use
/llms-full.txtfor the bulk content instead.
llms.txt vs llms-full.txt
The spec mentions a sibling file /llms-full.txt. It's used differently:
- llms.txt = curated index. Markdown table of contents pointing at your most valuable URLs. Stays under 50KB.
- llms-full.txt = full content dump. Inlines the actual text of every page in one large markdown file. Can be megabytes.
Most large adopters (Anthropic, Mintlify) ship both. AI agents pick whichever fits their context window. Our checker treats /llms-full.txt presence as a bonus signal (+5 points).
llms.txt vs robots.txt
| robots.txt | llms.txt | |
|---|---|---|
| Purpose | Access control — "don't go here" | Content curation — "here's what matters" |
| Audience | Search-engine crawlers | AI agents at inference time |
| Format | Plain-text directives | Markdown |
| When read | Before every crawl | When the AI answers a query about your site |
| Required by | RFC 9309 | llmstxt.org (community spec) |
They're complementary, not alternatives. Publish both — and make sure your robots.txt allows the AI bots (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) that read your llms.txt.
Common mistakes
- Missing H1. The spec's only required element. Without it, AI agents may skip the file entirely.
- Server returns HTML. A SPA route or custom 404 catches
/llms.txtrequests and returns HTML. AI agents expect markdown — they bail. - Bare URLs.
- https://example.com/docsworks but wastes the description slot. Use- [Docs](https://example.com/docs): Getting started. - HTML anchors. Copy-pasting
<a href>from a webpage. Markdown only. - "Click here" descriptions. Tells the AI nothing. Use descriptive link text.
- Auth-walled links. Pointing AI at a 401/login page is broken — anonymous AI agents can't authenticate.
- Robots-blocked links. Linking to a URL that your own robots.txt disallows. Pick one rule or the other.
- Blocking AI bots in robots.txt while publishing llms.txt. Contradictory — the AI agents the llms.txt is for can't read it.
- File too large. Over 500KB defeats the curation purpose. Move bulk to
/llms-full.txt. - Stale content. Treat llms.txt like a sitemap — refresh quarterly or when navigation changes.
How AI agents actually use it
When you ask ChatGPT "what does Acme's pricing look like?" the agent typically:
- Checks
https://acme.dev/llms.txt - If found, parses the H2 sections and link list
- Fetches a few high-relevance URLs (e.g., pricing-related)
- Answers using the curated content
Without an llms.txt, the AI falls back to whatever Bing or Google has indexed — which is noisy, often outdated, and missing the framing you wanted to surface. The opportunity cost is real: AI search is becoming the front door, and llms.txt is the welcome mat.
Generating yours
Hand-curating an llms.txt for a 500-page docs site is brutal. The InstaRank SEO /llms-txt-checker tool includes an AI-powered generator: it crawls your site, picks the top ~50 pages, and produces a spec-conformant llms.txt with curated H2 sections, blockquote summary, and proper markdown links. Edit before publishing.
The generator uses our centralized AI service (DeepSeek primary, Gemini fallback). 50 credits per run, refunded automatically if generation fails.
Validation checklist
Before you ship, confirm:
- File at
https://yourdomain.com/llms.txt(HTTPS root) - HTTP 200 response
Content-Type: text/markdownortext/plain- Starts with
# Site Name(H1) - Has a
> summary blockquoteon the next non-empty line - Has at least one
## Sectionwith markdown links - All linked URLs return 200 (no broken links)
- No duplicate URLs
- No auth-walled or paywalled URLs
- File size under 50KB (or shipped alongside
llms-full.txt) - Your robots.txt allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended
- Optional:
<link rel="llms" href="/llms.txt">in your homepage HTML head
FAQ
Should I block AI bots OR publish llms.txt?
Pick one. Publishing llms.txt while blocking GPTBot / ClaudeBot in robots.txt is self-defeating — the AI agents the file is meant for can't read it.
How often should I update it?
Treat it like a sitemap. Update when major content launches, navigation changes, or quarterly at minimum so AI agents stay aligned with your current site structure.
What if I have a multi-language site?
Spec doesn't define multi-language behavior. Common patterns: one master llms.txt with cross-language sections, or per-locale subpath files (which the spec doesn't require but doesn't forbid either).
Does it work for marketing sites or only docs?
Both. Documentation sites get the most obvious value (AI agents directly answer dev questions), but any site that wants to control how AI describes its product benefits — pricing, features, case studies, comparisons.
What about ai.txt or robots-ai.txt — should I use those instead?
ai.txt is a separate proposal focused on AI training opt-out (different problem). llms.txt is the one with real adoption for inference-time content guidance. They can coexist — they solve different problems.