llms.txt: The Complete 2026 SEO Guide for AI Search

14 min readTechnical SEO

When a user asks ChatGPT "what does Stripe's pricing look like?", the AI doesn't scroll Google results. It loads stripe.com/llms.txt first. That single file decides what context the AI uses to answer — and what it says about your product. llms.txt is the new front door, and most sites still don't have one. This guide covers everything you need to ship a spec-conformant llms.txt in 2026, from the original spec by Jeremy Howard at Answer.AI to the 14-point validation checklist we built into our free LLMs.txt Checker.

TL;DR Summary

  • llms.txt is a markdown file at yourdomain.com/llms.txt that tells AI agents which pages matter most.
  • It was proposed by Jeremy Howard at Answer.AI in September 2024. Already adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face.
  • The format is pure markdown: H1 (required) + blockquote summary + H2 sections of curated links.
  • Different from robots.txt: robots.txt gatekeeps access. llms.txt actively curates content for AI to load at inference time.
  • Pair with /llms-full.txt for AI agents that want full content depth in one fetch.
  • Keep it under 50KB ideally. Over 500KB defeats the curation purpose — use llms-full.txt for bulk.
  • Don't publish llms.txt while blocking AI bots in robots.txt — that's self-defeating.
  • Use our LLMs.txt Checker to validate yours across 14 spec-grade parameters. AI-powered generator included.

1. Why llms.txt Matters in 2026

For 25 years, SEO meant one thing: ranking on Google. You wrote content, you hoped Google would crawl it, and you measured success in SERP positions and organic clicks. That world is fading fast.

By mid-2026, roughly half of search-style queries go through AI assistants instead. Someone asks ChatGPT "what's the best PostgreSQL ORM for Node.js?", Claude "how does Stripe's pricing compare to Adyen's?", or Perplexity "is Vercel cheaper than AWS Amplify for my use case?". The AI answers directly. The user never sees a SERP. The question for site owners stopped being "does Google know about us?" — and became "does the AI know what to load when our domain comes up?".

That's exactly what llms.txt solves. You publish one markdown file at yourdomain.com/llms.txt. AI agents check it. They follow the curated links you list. They use that content to answer questions about you. Honored by every major AI agent that supports the standard — and adoption is growing fast.

The compounding cost of not having one

Without llms.txt, an AI agent answering a question about your domain falls back to whatever Google or Bing happens to have indexed. Which is usually:

  • Outdated. Search engines might still surface a pricing page from 2024.
  • Noisy. The AI might cite your customer review section instead of your actual feature list.
  • Wrong. Without curation, AI agents have surfaced competitor mentions or blog comments as if they were your official position.
  • Missing the framing you wanted. You lose control of the first impression.

Shipping an llms.txt is a few hours of work. The asset compounds over the next 5+ years of AI search adoption. That's about as good a leverage ratio as you'll find in technical SEO.

2. The Complete llms.txt Spec

Jeremy Howard at Answer.AI proposed the standard in September 2024. The spec lives at llmstxt.org and is intentionally minimal — pure markdown, no XML, no JSON schema. That minimalism is the point: any LLM can parse it without specialized libraries.

The full structure

# Site Name

> One-sentence summary — what the site is and who it's for.

Optional free-form prose between the blockquote and the first H2.

## Section Name

- [Link title](https://example.com/path): Optional description
- [Another link](https://example.com/other)

## Another Section

- [Page title](https://example.com/section/page)

## Optional

- [Lower-priority page](https://example.com/changelog): Skippable for tight context

What's required

The spec is almost aggressively minimal:

  • H1 with site or project name — this is the only required element of the entire spec. Everything else is optional, but recommended.

What's strongly recommended

  • Blockquote summary (>) directly after the H1. Gives the AI one sentence of project framing before scanning the link list. Without it, the AI infers context from URLs — which is unreliable.
  • One or more H2 sections with markdown link lists. Without these, the file is just a title — provides zero navigation value to AI agents.
  • An H2 literally named ## Optional — content AI agents can skip when context is tight. Spec-defined.

File location rules

  • HTTPS root only: https://yourdomain.com/llms.txt
  • Subdirectory paths like /docs/llms.txt are not part of the spec and will not be discovered
  • Must respond with HTTP 200
  • Served with Content-Type: text/markdown or text/plain
  • Never text/html — that signals to AI agents that you mis-routed the request

Link format rules

Inside an H2 section, every link must be a markdown list item:

# Good
- [Quickstart](https://example.com/docs/start): Five-minute setup

# Also good (no description)
- [API Reference](https://example.com/api)

# Bad — bare URL
- https://example.com/docs/start

# Bad — HTML anchor
- <a href="https://example.com/docs/start">Quickstart</a>

# Bad — generic anchor text
- [Click here](https://example.com/docs/start): Quickstart guide

The [text](url): description shape gives the AI three independent signals — link text, URL path, and one-line description. Bare URLs and generic anchor text ("click here", "read more") lose two of those three signals.

Important note on absolute URLs

The spec doesn't mandate absolute vs relative URLs, but every production adopter uses absolute URLs. Absolute survives non-root deployments, partial fetches, and quoting in AI responses. Use them.

3. llms.txt vs llms-full.txt

Reading the spec, you'll see references to a companion file: /llms-full.txt. The two files solve different problems for AI agents with different context budgets.

llms.txtllms-full.txt
PurposeCurated indexFull content dump
FormatMarkdown table of contents with linksInlined markdown of every page
Target sizeUnder 50KB idealCan be 1-10 MB
When AI agents use itTight context budget; need navigationGenerous budget; need depth
Ship both?Yes — that's the Anthropic / Mintlify pattern

Think of it like this: llms.txt is the table of contents. llms-full.txt is the whole book inlined. AI agents grab whichever fits their working memory.

Our checker treats /llms-full.txt presence as a +5 bonus signal. It's not required by the spec, but it's how the most adoption-forward sites ship.

4. Real-World Examples from Production Sites

Looking at what production sites are actually shipping is more useful than theorizing. Here are the patterns from major adopters as of May 2026.

Anthropic (docs.anthropic.com/llms.txt)

The reference implementation. Crisp H1, one-sentence blockquote, ~10 H2 sections grouped by audience ("Building with Claude", "API Reference", "Agents", "Tools"), and a paired /llms-full.txt. Total size: ~12KB. Every link is absolute. No bare URLs. No HTML anchors. Every link has a description after a colon. If you want to see what "clean" looks like, this is it.

Vercel (vercel.com/llms.txt)

The maximalist version. 17 H2 sections covering their entire documentation taxonomy — Access, AI, Build & Deploy, CDN, CLI, Collaboration, Compute, Flags, Integrations, Multi-tenant, Observability, Platform, Pricing, Security, Storage, REST API Reference. ~150KB total. That's well above the "ideal" 50KB threshold, but Vercel has so many doc surfaces that the depth is genuinely useful. Absolute URLs throughout.

Cloudflare (per-product files)

Different approach: Cloudflare ships per-product llms.txt files (Workers, Pages, R2, etc.) under their docs subdomains rather than one master file at the apex. The spec technically allows this, but it's less common — most AI agents only check the root. If you go this route, your apex llms.txt should at minimum link out to the per-product files.

Mintlify (auto-generated for every customer)

Mintlify auto-generates both /llms.txt and /llms-full.txt for every customer documentation site. The result: every Mintlify-hosted site has a spec-conformant llms.txt by default, regardless of whether the site owner explicitly opted in. This is the pattern other docs-platform vendors are starting to copy.

A minimal example for a typical SaaS site

You don't need 17 sections to ship a useful llms.txt. Here's a minimal one that covers the bases for a typical SaaS:

# Acme Analytics

> Acme is real-time product analytics for engineering teams. Connects to your warehouse in 5 minutes.

## Product

- [Features](https://acme.com/features): Real-time dashboards, alerts, audit logs
- [Pricing](https://acme.com/pricing): Free / Pro / Enterprise pricing tiers
- [Integrations](https://acme.com/integrations): Snowflake, BigQuery, Postgres, MySQL

## Documentation

- [Quickstart](https://acme.com/docs/quickstart): Five-minute setup from npm install to first event
- [API Reference](https://acme.com/api): Full endpoint catalog with code samples
- [SDKs](https://acme.com/docs/sdks): Official libraries in 8 languages

## Customers

- [Case studies](https://acme.com/customers): Real-world deployments
- [Reviews](https://acme.com/reviews): G2 and TrustRadius coverage

## Optional

- [Changelog](https://acme.com/changelog): Version history (skippable)
- [Blog](https://acme.com/blog): Engineering posts
- [Press](https://acme.com/press): Coverage and announcements

That's under 1KB and hits every spec recommendation. You could ship this today.

5. The 14-Parameter Validation Ladder

Our LLMs.txt Checker scores files across 14 parameters. The full ladder, in order of weight:

#ParameterWeight
1File exists at /llms.txt (HTTP 200)40
2Has H1 title (spec-required)25
3Has at least one H2 + markdown link list20
4All sampled URLs return 200 (up to 25 link probe)15
5Has blockquote summary directly after H110
6Served as text/markdown or text/plain5
7No duplicate links5
8Markdown link format (no bare URLs / HTML anchors / generic text)5
9No auth-walled links (no 401/403 or login redirects)5
10Healthy file size (under 500KB)5
11Linked URLs not blocked by robots.txt5
12Site's robots.txt allows AI bots (GPTBot, ClaudeBot, etc.)5
13Bonus: /llms-full.txt companion exists+5
14Bonus: HTML discovery tag (<link rel="llms">)+3

The weight distribution reflects what AI agents actually penalize. Missing H1 alone is a 25-point hit because the spec literally calls it the only required element. Broken links is 15 points because they actively degrade the AI's answer quality.

A note on cross-checks

Parameters #11 and #12 cross-check your llms.txt against your robots.txt. When your robots.txt is unavailable, the checker skips these (removes from the denominator) rather than penalizing — so your score still represents your llms.txt quality, not your infrastructure availability.

6. How AI Agents Actually Use llms.txt

The spec describes what the file is. It doesn't describe how AI agents consume it. That part is implementation-defined per agent, but the pattern is converging. Here's the typical flow:

Inference-time loading

When a user query mentions or implies your domain, the AI agent:

  1. Resolves the domain (e.g., extracts "stripe.com" from the query).
  2. Issues a GET request to https://stripe.com/llms.txt.
  3. If 200 OK with markdown content-type → parses H1, blockquote, H2 sections.
  4. Identifies which sections look most relevant to the query.
  5. Fetches 2-5 URLs from those sections.
  6. Synthesizes an answer using that content.

The whole flow takes 1-3 seconds and happens before the AI generates the answer the user sees. If the agent can't find or parse your llms.txt — file missing, server returns HTML, content-type wrong — it falls back to web search. You lose the curated path entirely.

Context window economics

AI agents have finite context windows. Loading your llms.txt costs them some of that budget. A 50KB llms.txt is roughly 12,000 tokens — significant but acceptable. A 500KB llms.txt is 120,000 tokens — most agents will refuse to load it or truncate aggressively.

That's why curation matters. Every link you list takes context budget. Listing 200 mediocre links is worse than listing 20 critical ones — the AI has to spend cycles deciding which to fetch, and may run out of budget before getting to the page that actually answers the query.

The Optional section as a release valve

The spec includes a special ## Optional section name with a specific semantic: AI agents can skip these URLs when context is tight. Put low-priority content here — changelog, press coverage, marketing pages — so the AI knows to drop them first under budget pressure.

7. The robots.txt Trap (and How to Avoid It)

The single biggest mistake we see in production: publishing llms.txt while blocking AI bots in robots.txt. This is self-defeating, and it's shockingly common.

Here's the broken setup:

# robots.txt — BROKEN if you also publish llms.txt

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Why this breaks: having an llms.txt file is an active invitation to AI agents. Blocking those same agents in robots.txt is telling them to go away. Well-behaved AI agents check both files. If robots.txt disallows them, they skip your entire site — including the llms.txt you so carefully curated.

The fix is straightforward — explicitly allow the AI bots you want to read llms.txt:

# robots.txt — works with llms.txt

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Our checker flags this contradiction as a moderate-severity issue. Easy fix, big upside.

8. Common Mistakes and How to Fix Them

Here's what we see most often in our checker telemetry — ranked by frequency.

Mistake 1: Server returns HTML instead of markdown

You publish an llms.txt, but when you visit yourdomain.com/llms.txt in the browser, you see your homepage. That means a SPA route or custom 404 page is catching the request. AI agents expect markdown — they bail when they see HTML.

Fix: configure your server (Next.js public folder, Vercel static routing, Cloudflare Pages, whatever) to serve /llms.txt as a static file with Content-Type: text/markdown.

Mistake 2: Missing H1

The spec's only required element. We've seen llms.txt files that start with ## Documentation directly. Without an H1, AI agents may skip the file entirely or apply a generic project name from the URL.

Fix: start the file with # Your Site Name on the first non-blank line.

Mistake 3: Bare URLs instead of markdown links

Authors paste URLs as-is, like - https://example.com/docs. This works technically, but wastes the description slot. AI agents pull the link text as context — bare URLs give them nothing extra.

Fix: use the [Title](url): description format every time.

Mistake 4: HTML anchor tags

Often a result of copy-paste from a rendered web page. Something like <a href="...">Quickstart</a>. Markdown parsers may or may not extract these correctly.

Fix: convert to markdown link syntax. If your CMS exports HTML, run it through a markdown converter before publishing.

Mistake 5: Generic anchor text ("Click here")

- [Click here](https://example.com/pricing): Pricing page — the description redeems it, but the link text tells the AI nothing. Some agents may de-prioritize generic-text links when picking which to fetch.

Fix: descriptive link text that stands alone. [Pricing plans](https://example.com/pricing).

Mistake 6: Auth-walled or paywalled links

Listing a URL that returns 401 or redirects to a login page is pointing AI at a dead end. Anonymous AI agents can't authenticate. They get a login screen and abandon the link.

Fix: either expose the content publicly or remove it from llms.txt. If you have to gate it, link to the public marketing page that previews the gated content instead.

Mistake 7: Links that conflict with your own robots.txt

Your llms.txt lists https://example.com/internal/api, but your robots.txt has Disallow: /internal/. Well-behaved AI agents follow robots.txt and refuse to load. You've advertised a dead end.

Fix: cross-check your llms.txt URLs against your own robots.txt. Pick one rule or the other — don't contradict yourself.

Mistake 8: File too large (over 500KB)

A 1MB llms.txt is "technically valid" but defeats its own purpose. AI agents may truncate it, refuse to load it, or burn so much context window that there's nothing left for the actual page content.

Fix: split. Keep /llms.txt as a focused curated index (under 50KB). Move the bulk content to /llms-full.txt. AI agents with budget will grab the full file; agents with tight budget stick to the index.

Mistake 9: Stale content

You shipped an llms.txt in January 2025. Now it's May 2026, you've renamed three sections, killed a product line, and added a new feature. The llms.txt still points at the January URLs. AI agents follow stale links to dead pages, then fall back to web search anyway.

Fix: treat llms.txt like a sitemap. Regenerate quarterly, or whenever navigation changes.

Mistake 10: No /llms-full.txt companion

Not a critical mistake — it's a missed opportunity. AI agents with generous context budgets can answer better when they can grab full content in one shot. Shipping /llms-full.txt alongside the curated index gives you both surfaces.

Fix: most static site generators have plugins that auto-generate /llms-full.txt from your markdown content. Or write a small build script. Or use a tool like ours that auto-generates both.

9. Generating Yours with AI

Hand-curating an llms.txt for a small marketing site takes 30 minutes. For a 500-page documentation site, it's brutal — you're manually deciding which URLs make the cut, which section each one belongs in, what description gives the AI useful signal, and so on. That's why AI-powered generators exist.

Our LLMs.txt Checker includes a generator that runs in four steps:

  1. Crawl — automatically discovers up to 50 of your most important pages, prioritized by homepage status, internal link count, and content freshness.
  2. AI categorize — feeds the page list (URL + title + meta description) to our centralized AI service (DeepSeek primary, Gemini and Claude fallback). The AI groups pages into H2 sections and writes the H1 + blockquote summary.
  3. Edit — returns an editable markdown textarea. You refine section names, tighten descriptions, move low-priority URLs into the Optional section.
  4. Ship — copy or download the final file. Upload to your domain root as /llms.txt.

The generator uses 50 credits per run, refunded automatically if generation fails or returns malformed output. Worst-case cost is zero. Typical run time: 30-90 seconds.

A note on AI generator output

Always edit the AI output before publishing. The AI is good at structure and categorization — it's less good at brand voice, product nuance, and knowing which obscure page secretly drives 30% of your support tickets. Treat the output as a strong first draft, not a finished file.

10. Frequently Asked Questions

Will AI agents actually load my llms.txt, or is this all theoretical?

They actually load it. Inference-time loading by ChatGPT browse mode, Claude, Perplexity, and Gemini is now standard behavior when a query implies a specific domain. Adoption on the AI agent side is mature; the bottleneck is on the publisher side — most sites still haven't shipped one.

Should I publish llms.txt if my site is mostly marketing pages (no docs)?

Yes. Marketing sites benefit just as much — maybe more. When AI agents answer questions like "what does Acme do?" or "how does Acme's pricing compare?", you want the agent to load your framing, not a competitor's blog post or an old Reddit thread.

What about multi-language sites?

The spec doesn't define multi-language behavior. Two common patterns in production: one master llms.txt with cross-language sections (e.g., "Documentation (English)", "Documentation (Japanese)"), or per-locale subpaths like /en/llms.txt and /ja/llms.txt. The latter isn't spec-required and AI agents may not auto-discover the per-locale files — the master-file approach is safer.

Does llms.txt replace sitemap.xml?

No. They're different files for different audiences. sitemap.xml is the exhaustive URL list for traditional search engines (Google, Bing). llms.txt is the curated content index for AI agents. Ship both.

What if I have content I don't want AI to use for training?

llms.txt is for inference-time guidance — what the AI loads when answering a user query. Training-time opt-out is a separate problem. For that, use AI-specific User-agent Disallow rules in robots.txt (GPTBot, ClaudeBot training crawlers honor these) or the emerging /ai.txt standard. They're complementary, not alternatives.

Can a single llms.txt file have thousands of links?

Technically yes, practically no. Past about 200 links the file exceeds 50KB and you start losing AI agents that have tight context budgets. If you genuinely have thousands of URLs worth surfacing, curate the top ~50 into llms.txt and put the full list in llms-full.txt.

Is there a way to validate my llms.txt automatically?

Yes — our LLMs.txt Checker validates across 14 parameters in about 30 seconds. Free, no sign-up to view results.

Will llms.txt still matter in 2027?

Almost certainly. Either: (a) the spec stays the de facto standard and adoption keeps growing — which is the current trajectory; or (b) it gets superseded by a successor spec, in which case sites that shipped llms.txt will be in the same position (or better) when they migrate. Either path rewards shipping now.

What's the difference between ai.txt, llms.txt, and robots.txt?

Three files, three purposes. robots.txt: traditional crawler access control ("don't crawl this URL"). ai.txt: AI training opt-out (separate emerging spec). llms.txt: inference-time content curation for AI agents. They coexist. See our companion post llms.txt vs robots.txt for the full comparison.

Ship your llms.txt today

Validate yours across 14 spec-grade parameters in 30 seconds. Don't have one? Our AI generator crawls your site and produces a spec-conformant llms.txt in under 2 minutes.

Run the LLMs.txt Checker