llms.txt: The Complete 2026 SEO Guide for AI Search
When a user asks ChatGPT "what does Stripe's pricing look like?", the AI doesn't scroll Google results. It loads stripe.com/llms.txt first. That single file decides what context the AI uses to answer — and what it says about your product. llms.txt is the new front door, and most sites still don't have one. This guide covers everything you need to ship a spec-conformant llms.txt in 2026, from the original spec by Jeremy Howard at Answer.AI to the 14-point validation checklist we built into our free LLMs.txt Checker.
TL;DR Summary
- llms.txt is a markdown file at
yourdomain.com/llms.txtthat tells AI agents which pages matter most. - It was proposed by Jeremy Howard at Answer.AI in September 2024. Already adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face.
- The format is pure markdown: H1 (required) + blockquote summary + H2 sections of curated links.
- Different from robots.txt: robots.txt gatekeeps access. llms.txt actively curates content for AI to load at inference time.
- Pair with /llms-full.txt for AI agents that want full content depth in one fetch.
- Keep it under 50KB ideally. Over 500KB defeats the curation purpose — use llms-full.txt for bulk.
- Don't publish llms.txt while blocking AI bots in robots.txt — that's self-defeating.
- Use our LLMs.txt Checker to validate yours across 14 spec-grade parameters. AI-powered generator included.
1. Why llms.txt Matters in 2026
For 25 years, SEO meant one thing: ranking on Google. You wrote content, you hoped Google would crawl it, and you measured success in SERP positions and organic clicks. That world is fading fast.
By mid-2026, roughly half of search-style queries go through AI assistants instead. Someone asks ChatGPT "what's the best PostgreSQL ORM for Node.js?", Claude "how does Stripe's pricing compare to Adyen's?", or Perplexity "is Vercel cheaper than AWS Amplify for my use case?". The AI answers directly. The user never sees a SERP. The question for site owners stopped being "does Google know about us?" — and became "does the AI know what to load when our domain comes up?".
That's exactly what llms.txt solves. You publish one markdown file at yourdomain.com/llms.txt. AI agents check it. They follow the curated links you list. They use that content to answer questions about you. Honored by every major AI agent that supports the standard — and adoption is growing fast.
The compounding cost of not having one
Without llms.txt, an AI agent answering a question about your domain falls back to whatever Google or Bing happens to have indexed. Which is usually:
- Outdated. Search engines might still surface a pricing page from 2024.
- Noisy. The AI might cite your customer review section instead of your actual feature list.
- Wrong. Without curation, AI agents have surfaced competitor mentions or blog comments as if they were your official position.
- Missing the framing you wanted. You lose control of the first impression.
Shipping an llms.txt is a few hours of work. The asset compounds over the next 5+ years of AI search adoption. That's about as good a leverage ratio as you'll find in technical SEO.
2. The Complete llms.txt Spec
Jeremy Howard at Answer.AI proposed the standard in September 2024. The spec lives at llmstxt.org and is intentionally minimal — pure markdown, no XML, no JSON schema. That minimalism is the point: any LLM can parse it without specialized libraries.
The full structure
# Site Name > One-sentence summary — what the site is and who it's for. Optional free-form prose between the blockquote and the first H2. ## Section Name - [Link title](https://example.com/path): Optional description - [Another link](https://example.com/other) ## Another Section - [Page title](https://example.com/section/page) ## Optional - [Lower-priority page](https://example.com/changelog): Skippable for tight context
What's required
The spec is almost aggressively minimal:
- H1 with site or project name — this is the only required element of the entire spec. Everything else is optional, but recommended.
What's strongly recommended
- Blockquote summary (
>) directly after the H1. Gives the AI one sentence of project framing before scanning the link list. Without it, the AI infers context from URLs — which is unreliable. - One or more H2 sections with markdown link lists. Without these, the file is just a title — provides zero navigation value to AI agents.
- An H2 literally named
## Optional— content AI agents can skip when context is tight. Spec-defined.
File location rules
- HTTPS root only:
https://yourdomain.com/llms.txt - Subdirectory paths like
/docs/llms.txtare not part of the spec and will not be discovered - Must respond with HTTP 200
- Served with
Content-Type: text/markdownortext/plain - Never
text/html— that signals to AI agents that you mis-routed the request
Link format rules
Inside an H2 section, every link must be a markdown list item:
# Good - [Quickstart](https://example.com/docs/start): Five-minute setup # Also good (no description) - [API Reference](https://example.com/api) # Bad — bare URL - https://example.com/docs/start # Bad — HTML anchor - <a href="https://example.com/docs/start">Quickstart</a> # Bad — generic anchor text - [Click here](https://example.com/docs/start): Quickstart guide
The [text](url): description shape gives the AI three independent signals — link text, URL path, and one-line description. Bare URLs and generic anchor text ("click here", "read more") lose two of those three signals.
Important note on absolute URLs
The spec doesn't mandate absolute vs relative URLs, but every production adopter uses absolute URLs. Absolute survives non-root deployments, partial fetches, and quoting in AI responses. Use them.
3. llms.txt vs llms-full.txt
Reading the spec, you'll see references to a companion file: /llms-full.txt. The two files solve different problems for AI agents with different context budgets.
| llms.txt | llms-full.txt | |
|---|---|---|
| Purpose | Curated index | Full content dump |
| Format | Markdown table of contents with links | Inlined markdown of every page |
| Target size | Under 50KB ideal | Can be 1-10 MB |
| When AI agents use it | Tight context budget; need navigation | Generous budget; need depth |
| Ship both? | Yes — that's the Anthropic / Mintlify pattern | |
Think of it like this: llms.txt is the table of contents. llms-full.txt is the whole book inlined. AI agents grab whichever fits their working memory.
Our checker treats /llms-full.txt presence as a +5 bonus signal. It's not required by the spec, but it's how the most adoption-forward sites ship.
4. Real-World Examples from Production Sites
Looking at what production sites are actually shipping is more useful than theorizing. Here are the patterns from major adopters as of May 2026.
Anthropic (docs.anthropic.com/llms.txt)
The reference implementation. Crisp H1, one-sentence blockquote, ~10 H2 sections grouped by audience ("Building with Claude", "API Reference", "Agents", "Tools"), and a paired /llms-full.txt. Total size: ~12KB. Every link is absolute. No bare URLs. No HTML anchors. Every link has a description after a colon. If you want to see what "clean" looks like, this is it.
Vercel (vercel.com/llms.txt)
The maximalist version. 17 H2 sections covering their entire documentation taxonomy — Access, AI, Build & Deploy, CDN, CLI, Collaboration, Compute, Flags, Integrations, Multi-tenant, Observability, Platform, Pricing, Security, Storage, REST API Reference. ~150KB total. That's well above the "ideal" 50KB threshold, but Vercel has so many doc surfaces that the depth is genuinely useful. Absolute URLs throughout.
Cloudflare (per-product files)
Different approach: Cloudflare ships per-product llms.txt files (Workers, Pages, R2, etc.) under their docs subdomains rather than one master file at the apex. The spec technically allows this, but it's less common — most AI agents only check the root. If you go this route, your apex llms.txt should at minimum link out to the per-product files.
Mintlify (auto-generated for every customer)
Mintlify auto-generates both /llms.txt and /llms-full.txt for every customer documentation site. The result: every Mintlify-hosted site has a spec-conformant llms.txt by default, regardless of whether the site owner explicitly opted in. This is the pattern other docs-platform vendors are starting to copy.
A minimal example for a typical SaaS site
You don't need 17 sections to ship a useful llms.txt. Here's a minimal one that covers the bases for a typical SaaS:
# Acme Analytics > Acme is real-time product analytics for engineering teams. Connects to your warehouse in 5 minutes. ## Product - [Features](https://acme.com/features): Real-time dashboards, alerts, audit logs - [Pricing](https://acme.com/pricing): Free / Pro / Enterprise pricing tiers - [Integrations](https://acme.com/integrations): Snowflake, BigQuery, Postgres, MySQL ## Documentation - [Quickstart](https://acme.com/docs/quickstart): Five-minute setup from npm install to first event - [API Reference](https://acme.com/api): Full endpoint catalog with code samples - [SDKs](https://acme.com/docs/sdks): Official libraries in 8 languages ## Customers - [Case studies](https://acme.com/customers): Real-world deployments - [Reviews](https://acme.com/reviews): G2 and TrustRadius coverage ## Optional - [Changelog](https://acme.com/changelog): Version history (skippable) - [Blog](https://acme.com/blog): Engineering posts - [Press](https://acme.com/press): Coverage and announcements
That's under 1KB and hits every spec recommendation. You could ship this today.
5. The 14-Parameter Validation Ladder
Our LLMs.txt Checker scores files across 14 parameters. The full ladder, in order of weight:
| # | Parameter | Weight |
|---|---|---|
| 1 | File exists at /llms.txt (HTTP 200) | 40 |
| 2 | Has H1 title (spec-required) | 25 |
| 3 | Has at least one H2 + markdown link list | 20 |
| 4 | All sampled URLs return 200 (up to 25 link probe) | 15 |
| 5 | Has blockquote summary directly after H1 | 10 |
| 6 | Served as text/markdown or text/plain | 5 |
| 7 | No duplicate links | 5 |
| 8 | Markdown link format (no bare URLs / HTML anchors / generic text) | 5 |
| 9 | No auth-walled links (no 401/403 or login redirects) | 5 |
| 10 | Healthy file size (under 500KB) | 5 |
| 11 | Linked URLs not blocked by robots.txt | 5 |
| 12 | Site's robots.txt allows AI bots (GPTBot, ClaudeBot, etc.) | 5 |
| 13 | Bonus: /llms-full.txt companion exists | +5 |
| 14 | Bonus: HTML discovery tag (<link rel="llms">) | +3 |
The weight distribution reflects what AI agents actually penalize. Missing H1 alone is a 25-point hit because the spec literally calls it the only required element. Broken links is 15 points because they actively degrade the AI's answer quality.
A note on cross-checks
Parameters #11 and #12 cross-check your llms.txt against your robots.txt. When your robots.txt is unavailable, the checker skips these (removes from the denominator) rather than penalizing — so your score still represents your llms.txt quality, not your infrastructure availability.
6. How AI Agents Actually Use llms.txt
The spec describes what the file is. It doesn't describe how AI agents consume it. That part is implementation-defined per agent, but the pattern is converging. Here's the typical flow:
Inference-time loading
When a user query mentions or implies your domain, the AI agent:
- Resolves the domain (e.g., extracts "stripe.com" from the query).
- Issues a GET request to
https://stripe.com/llms.txt. - If 200 OK with markdown content-type → parses H1, blockquote, H2 sections.
- Identifies which sections look most relevant to the query.
- Fetches 2-5 URLs from those sections.
- Synthesizes an answer using that content.
The whole flow takes 1-3 seconds and happens before the AI generates the answer the user sees. If the agent can't find or parse your llms.txt — file missing, server returns HTML, content-type wrong — it falls back to web search. You lose the curated path entirely.
Context window economics
AI agents have finite context windows. Loading your llms.txt costs them some of that budget. A 50KB llms.txt is roughly 12,000 tokens — significant but acceptable. A 500KB llms.txt is 120,000 tokens — most agents will refuse to load it or truncate aggressively.
That's why curation matters. Every link you list takes context budget. Listing 200 mediocre links is worse than listing 20 critical ones — the AI has to spend cycles deciding which to fetch, and may run out of budget before getting to the page that actually answers the query.
The Optional section as a release valve
The spec includes a special ## Optional section name with a specific semantic: AI agents can skip these URLs when context is tight. Put low-priority content here — changelog, press coverage, marketing pages — so the AI knows to drop them first under budget pressure.
7. The robots.txt Trap (and How to Avoid It)
The single biggest mistake we see in production: publishing llms.txt while blocking AI bots in robots.txt. This is self-defeating, and it's shockingly common.
Here's the broken setup:
# robots.txt — BROKEN if you also publish llms.txt User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: PerplexityBot Disallow: / User-agent: Google-Extended Disallow: /
Why this breaks: having an llms.txt file is an active invitation to AI agents. Blocking those same agents in robots.txt is telling them to go away. Well-behaved AI agents check both files. If robots.txt disallows them, they skip your entire site — including the llms.txt you so carefully curated.
The fix is straightforward — explicitly allow the AI bots you want to read llms.txt:
# robots.txt — works with llms.txt User-agent: * Disallow: /admin/ Disallow: /private/ User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: ClaudeBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: anthropic-ai Allow: / User-agent: CCBot Allow: / Sitemap: https://example.com/sitemap.xml
Our checker flags this contradiction as a moderate-severity issue. Easy fix, big upside.
8. Common Mistakes and How to Fix Them
Here's what we see most often in our checker telemetry — ranked by frequency.
Mistake 1: Server returns HTML instead of markdown
You publish an llms.txt, but when you visit yourdomain.com/llms.txt in the browser, you see your homepage. That means a SPA route or custom 404 page is catching the request. AI agents expect markdown — they bail when they see HTML.
Fix: configure your server (Next.js public folder, Vercel static routing, Cloudflare Pages, whatever) to serve /llms.txt as a static file with Content-Type: text/markdown.
Mistake 2: Missing H1
The spec's only required element. We've seen llms.txt files that start with ## Documentation directly. Without an H1, AI agents may skip the file entirely or apply a generic project name from the URL.
Fix: start the file with # Your Site Name on the first non-blank line.
Mistake 3: Bare URLs instead of markdown links
Authors paste URLs as-is, like - https://example.com/docs. This works technically, but wastes the description slot. AI agents pull the link text as context — bare URLs give them nothing extra.
Fix: use the [Title](url): description format every time.
Mistake 4: HTML anchor tags
Often a result of copy-paste from a rendered web page. Something like <a href="...">Quickstart</a>. Markdown parsers may or may not extract these correctly.
Fix: convert to markdown link syntax. If your CMS exports HTML, run it through a markdown converter before publishing.
Mistake 5: Generic anchor text ("Click here")
- [Click here](https://example.com/pricing): Pricing page — the description redeems it, but the link text tells the AI nothing. Some agents may de-prioritize generic-text links when picking which to fetch.
Fix: descriptive link text that stands alone. [Pricing plans](https://example.com/pricing).
Mistake 6: Auth-walled or paywalled links
Listing a URL that returns 401 or redirects to a login page is pointing AI at a dead end. Anonymous AI agents can't authenticate. They get a login screen and abandon the link.
Fix: either expose the content publicly or remove it from llms.txt. If you have to gate it, link to the public marketing page that previews the gated content instead.
Mistake 7: Links that conflict with your own robots.txt
Your llms.txt lists https://example.com/internal/api, but your robots.txt has Disallow: /internal/. Well-behaved AI agents follow robots.txt and refuse to load. You've advertised a dead end.
Fix: cross-check your llms.txt URLs against your own robots.txt. Pick one rule or the other — don't contradict yourself.
Mistake 8: File too large (over 500KB)
A 1MB llms.txt is "technically valid" but defeats its own purpose. AI agents may truncate it, refuse to load it, or burn so much context window that there's nothing left for the actual page content.
Fix: split. Keep /llms.txt as a focused curated index (under 50KB). Move the bulk content to /llms-full.txt. AI agents with budget will grab the full file; agents with tight budget stick to the index.
Mistake 9: Stale content
You shipped an llms.txt in January 2025. Now it's May 2026, you've renamed three sections, killed a product line, and added a new feature. The llms.txt still points at the January URLs. AI agents follow stale links to dead pages, then fall back to web search anyway.
Fix: treat llms.txt like a sitemap. Regenerate quarterly, or whenever navigation changes.
Mistake 10: No /llms-full.txt companion
Not a critical mistake — it's a missed opportunity. AI agents with generous context budgets can answer better when they can grab full content in one shot. Shipping /llms-full.txt alongside the curated index gives you both surfaces.
Fix: most static site generators have plugins that auto-generate /llms-full.txt from your markdown content. Or write a small build script. Or use a tool like ours that auto-generates both.
9. Generating Yours with AI
Hand-curating an llms.txt for a small marketing site takes 30 minutes. For a 500-page documentation site, it's brutal — you're manually deciding which URLs make the cut, which section each one belongs in, what description gives the AI useful signal, and so on. That's why AI-powered generators exist.
Our LLMs.txt Checker includes a generator that runs in four steps:
- Crawl — automatically discovers up to 50 of your most important pages, prioritized by homepage status, internal link count, and content freshness.
- AI categorize — feeds the page list (URL + title + meta description) to our centralized AI service (DeepSeek primary, Gemini and Claude fallback). The AI groups pages into H2 sections and writes the H1 + blockquote summary.
- Edit — returns an editable markdown textarea. You refine section names, tighten descriptions, move low-priority URLs into the Optional section.
- Ship — copy or download the final file. Upload to your domain root as
/llms.txt.
The generator uses 50 credits per run, refunded automatically if generation fails or returns malformed output. Worst-case cost is zero. Typical run time: 30-90 seconds.
A note on AI generator output
Always edit the AI output before publishing. The AI is good at structure and categorization — it's less good at brand voice, product nuance, and knowing which obscure page secretly drives 30% of your support tickets. Treat the output as a strong first draft, not a finished file.
10. Frequently Asked Questions
Will AI agents actually load my llms.txt, or is this all theoretical?
They actually load it. Inference-time loading by ChatGPT browse mode, Claude, Perplexity, and Gemini is now standard behavior when a query implies a specific domain. Adoption on the AI agent side is mature; the bottleneck is on the publisher side — most sites still haven't shipped one.
Should I publish llms.txt if my site is mostly marketing pages (no docs)?
Yes. Marketing sites benefit just as much — maybe more. When AI agents answer questions like "what does Acme do?" or "how does Acme's pricing compare?", you want the agent to load your framing, not a competitor's blog post or an old Reddit thread.
What about multi-language sites?
The spec doesn't define multi-language behavior. Two common patterns in production: one master llms.txt with cross-language sections (e.g., "Documentation (English)", "Documentation (Japanese)"), or per-locale subpaths like /en/llms.txt and /ja/llms.txt. The latter isn't spec-required and AI agents may not auto-discover the per-locale files — the master-file approach is safer.
Does llms.txt replace sitemap.xml?
No. They're different files for different audiences. sitemap.xml is the exhaustive URL list for traditional search engines (Google, Bing). llms.txt is the curated content index for AI agents. Ship both.
What if I have content I don't want AI to use for training?
llms.txt is for inference-time guidance — what the AI loads when answering a user query. Training-time opt-out is a separate problem. For that, use AI-specific User-agent Disallow rules in robots.txt (GPTBot, ClaudeBot training crawlers honor these) or the emerging /ai.txt standard. They're complementary, not alternatives.
Can a single llms.txt file have thousands of links?
Technically yes, practically no. Past about 200 links the file exceeds 50KB and you start losing AI agents that have tight context budgets. If you genuinely have thousands of URLs worth surfacing, curate the top ~50 into llms.txt and put the full list in llms-full.txt.
Is there a way to validate my llms.txt automatically?
Yes — our LLMs.txt Checker validates across 14 parameters in about 30 seconds. Free, no sign-up to view results.
Will llms.txt still matter in 2027?
Almost certainly. Either: (a) the spec stays the de facto standard and adoption keeps growing — which is the current trajectory; or (b) it gets superseded by a successor spec, in which case sites that shipped llms.txt will be in the same position (or better) when they migrate. Either path rewards shipping now.
What's the difference between ai.txt, llms.txt, and robots.txt?
Three files, three purposes. robots.txt: traditional crawler access control ("don't crawl this URL"). ai.txt: AI training opt-out (separate emerging spec). llms.txt: inference-time content curation for AI agents. They coexist. See our companion post llms.txt vs robots.txt for the full comparison.
Ship your llms.txt today
Validate yours across 14 spec-grade parameters in 30 seconds. Don't have one? Our AI generator crawls your site and produces a spec-conformant llms.txt in under 2 minutes.
Run the LLMs.txt Checker