What is llms.txt and why does it matter for SEO in 2026?

llms.txt is a markdown file at your domain root (example.com/llms.txt) that tells AI agents — ChatGPT, Claude, Perplexity, Gemini — which pages to load when answering user queries about your site. As AI search replaces traditional SERPs for many query types, llms.txt has become the curated entry point into the AI context window.

Who created the llms.txt standard?

Jeremy Howard at Answer.AI proposed the standard in September 2024. The spec is hosted at llmstxt.org. By mid-2026 it has been adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face, and thousands of smaller documentation-heavy and product sites.

Where should the llms.txt file be placed?

At the root of your domain over HTTPS — yourdomain.com/llms.txt. The file must be served with Content-Type text/markdown or text/plain. Subdirectory paths like /docs/llms.txt are not part of the spec and will not be discovered by AI agents.

What is the difference between llms.txt and llms-full.txt?

llms.txt is the curated index — a markdown table of contents with links to your most valuable pages, ideally under 50KB. llms-full.txt is the full content dump — every page inlined as markdown in one large file, which can be megabytes. AI agents pick whichever fits their context window. Anthropic and Mintlify ship both.

Is llms.txt required, or just a nice-to-have?

It is not required by any official standard, but it has become the de facto convention. Sites without llms.txt depend on AI agents scraping their HTML and inferring structure — which is noisy, often outdated, and frequently misses the context the site owner wanted to surface.

Does llms.txt work for marketing sites or only documentation?

Both. Documentation-heavy sites see the most obvious value (AI agents directly answer developer questions), but any site that wants control over how AI describes its product benefits — including pricing pages, feature comparisons, case studies, and even blog content.

Should I block AI bots in robots.txt while publishing llms.txt?

No. That is self-defeating. Publishing llms.txt says "AI agents, please use this file." Blocking those same agents — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — in robots.txt tells them to go away. Well-behaved AI agents check both files; if robots.txt disallows them, the llms.txt is ignored entirely.

How large can the llms.txt file be?

Under 50KB is ideal for the curation purpose. Under 150KB is acceptable. Files over 500KB defeat the purpose — AI agents may truncate or refuse to load them. For bulk content, ship /llms-full.txt alongside the curated /llms.txt.

How often should I update llms.txt?

Treat it like a sitemap. Update when major content launches (new product area, doc rewrite), when navigation changes (URL restructure, renamed sections), and quarterly at minimum so AI agents stay aligned with your current site structure.

Can I generate llms.txt automatically?

Yes. InstaRank SEO and other tools offer generators that crawl your site, pick the most important pages, and produce a spec-conformant llms.txt with curated H2 sections, blockquote summary, and proper markdown links. You edit the output before publishing.

llms.txt: The Complete 2026 SEO Guide for AI Search

May 26, 2026•14 min read•Technical SEO

When a user asks ChatGPT "what does Stripe's pricing look like?", the AI doesn't scroll Google results. It loads stripe.com/llms.txt first. That single file decides what context the AI uses to answer — and what it says about your product. llms.txt is the new front door, and most sites still don't have one. This guide covers everything you need to ship a spec-conformant llms.txt in 2026, from the original spec by Jeremy Howard at Answer.AI to the validation checklist we built into our free LLMs.txt Checker.

TL;DR Summary

llms.txt is a markdown file at yourdomain.com/llms.txt that tells AI agents which pages matter most.
It was proposed by Jeremy Howard at Answer.AI in September 2024. Already adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face.
The format is pure markdown: H1 (required) + blockquote summary + H2 sections of curated links.
Different from robots.txt: robots.txt gatekeeps access. llms.txt actively curates content for AI to load at inference time.
Pair with /llms-full.txt for AI agents that want full content depth in one fetch.
Keep it under 50KB ideally. Over 500KB defeats the curation purpose — use llms-full.txt for bulk.
Don't publish llms.txt while blocking AI bots in robots.txt — that's self-defeating.
Use our LLMs.txt Checker to validate yours against the full llms.txt spec. One-click generator included.

1. Why llms.txt Matters in 2026

For 25 years, SEO meant one thing: ranking on Google. You wrote content, you hoped Google would crawl it, and you measured success in SERP positions and organic clicks. That world is fading fast.

By mid-2026, roughly half of search-style queries go through AI assistants instead. Someone asks ChatGPT "what's the best PostgreSQL ORM for Node.js?", Claude "how does Stripe's pricing compare to Adyen's?", or Perplexity "is Vercel cheaper than AWS Amplify for my use case?". The AI answers directly. The user never sees a SERP. The question for site owners stopped being "does Google know about us?" — and became "does the AI know what to load when our domain comes up?".

That's exactly what llms.txt solves. You publish one markdown file at yourdomain.com/llms.txt. AI agents check it. They follow the curated links you list. They use that content to answer questions about you. Honored by every major AI agent that supports the standard — and adoption is growing fast.

The compounding cost of not having one

Without llms.txt, an AI agent answering a question about your domain falls back to whatever Google or Bing happens to have indexed. Which is usually:

Outdated. Search engines might still surface a pricing page from 2024.
Noisy. The AI might cite your customer review section instead of your actual feature list.
Wrong. Without curation, AI agents have surfaced competitor mentions or blog comments as if they were your official position.
Missing the framing you wanted. You lose control of the first impression.

Shipping an llms.txt is a few hours of work. The asset compounds over the next 5+ years of AI search adoption. That's about as good a leverage ratio as you'll find in technical SEO.

2. The Complete llms.txt Spec

Jeremy Howard at Answer.AI proposed the standard in September 2024. The spec lives at llmstxt.org and is intentionally minimal — pure markdown, no XML, no JSON schema. That minimalism is the point: any LLM can parse it without specialized libraries.

The full structure

# Site Name

> One-sentence summary — what the site is and who it's for.

Optional free-form prose between the blockquote and the first H2.

## Section Name

- [Link title](https://example.com/path): Optional description
- [Another link](https://example.com/other)

## Another Section

- [Page title](https://example.com/section/page)

## Optional

- [Lower-priority page](https://example.com/changelog): Skippable for tight context

What's required

The spec is almost aggressively minimal:

H1 with site or project name — this is the only required element of the entire spec. Everything else is optional, but recommended.

What's strongly recommended

Blockquote summary (>) directly after the H1. Gives the AI one sentence of project framing before scanning the link list. Without it, the AI infers context from URLs — which is unreliable.
One or more H2 sections with markdown link lists. Without these, the file is just a title — provides zero navigation value to AI agents.
An H2 literally named ## Optional — content AI agents can skip when context is tight. Spec-defined.

File location rules

HTTPS root only: https://yourdomain.com/llms.txt
Subdirectory paths like /docs/llms.txt are not part of the spec and will not be discovered
Must respond with HTTP 200
Served with Content-Type: text/markdown or text/plain
Never text/html — that signals to AI agents that you mis-routed the request

Link format rules

Inside an H2 section, every link must be a markdown list item:

# Good
- [Quickstart](https://example.com/docs/start): Five-minute setup

# Also good (no description)
- [API Reference](https://example.com/api)

# Bad — bare URL
- https://example.com/docs/start

# Bad — HTML anchor
- <a href="https://example.com/docs/start">Quickstart</a>

# Bad — generic anchor text
- [Click here](https://example.com/docs/start): Quickstart guide

The [text](url): description shape gives the AI three independent signals — link text, URL path, and one-line description. Bare URLs and generic anchor text ("click here", "read more") lose two of those three signals.

Important note on absolute URLs

The spec doesn't mandate absolute vs relative URLs, but every production adopter uses absolute URLs. Absolute survives non-root deployments, partial fetches, and quoting in AI responses. Use them.

3. llms.txt vs llms-full.txt

Reading the spec, you'll see references to a companion file: /llms-full.txt. The two files solve different problems for AI agents with different context budgets.

	llms.txt	llms-full.txt
Purpose	Curated index	Full content dump
Format	Markdown table of contents with links	Inlined markdown of every page
Target size	Under 50KB ideal	Can be 1-10 MB
When AI agents use it	Tight context budget; need navigation	Generous budget; need depth
Ship both?	Yes — that's the Anthropic / Mintlify pattern

Think of it like this: llms.txt is the table of contents. llms-full.txt is the whole book inlined. AI agents grab whichever fits their working memory.

Our checker rewards shipping a /llms-full.txt companion. It's not required by the spec, but it's how the most adoption-forward sites ship.

4. Real-World Examples from Production Sites

Looking at what production sites are actually shipping is more useful than theorizing. Here are the patterns from major adopters as of May 2026.

Anthropic (docs.anthropic.com/llms.txt)

The reference implementation. Crisp H1, one-sentence blockquote, ~10 H2 sections grouped by audience ("Building with Claude", "API Reference", "Agents", "Tools"), and a paired /llms-full.txt. Total size: ~12KB. Every link is absolute. No bare URLs. No HTML anchors. Every link has a description after a colon. If you want to see what "clean" looks like, this is it.

Vercel (vercel.com/llms.txt)

The maximalist version. 17 H2 sections covering their entire documentation taxonomy — Access, AI, Build & Deploy, CDN, CLI, Collaboration, Compute, Flags, Integrations, Multi-tenant, Observability, Platform, Pricing, Security, Storage, REST API Reference. ~150KB total. That's well above the "ideal" 50KB threshold, but Vercel has so many doc surfaces that the depth is genuinely useful. Absolute URLs throughout.

Cloudflare (per-product files)

Different approach: Cloudflare ships per-product llms.txt files (Workers, Pages, R2, etc.) under their docs subdomains rather than one master file at the apex. The spec technically allows this, but it's less common — most AI agents only check the root. If you go this route, your apex llms.txt should at minimum link out to the per-product files.

Mintlify (auto-generated for every customer)

Mintlify auto-generates both /llms.txt and /llms-full.txt for every customer documentation site. The result: every Mintlify-hosted site has a spec-conformant llms.txt by default, regardless of whether the site owner explicitly opted in. This is the pattern other docs-platform vendors are starting to copy.

A minimal example for a typical SaaS site

You don't need 17 sections to ship a useful llms.txt. Here's a minimal one that covers the bases for a typical SaaS:

# Acme Analytics

> Acme is real-time product analytics for engineering teams. Connects to your warehouse in 5 minutes.

## Product

- [Features](https://acme.com/features): Real-time dashboards, alerts, audit logs
- [Pricing](https://acme.com/pricing): Free / Pro / Enterprise pricing tiers
- [Integrations](https://acme.com/integrations): Snowflake, BigQuery, Postgres, MySQL

## Documentation

- [Quickstart](https://acme.com/docs/quickstart): Five-minute setup from npm install to first event
- [API Reference](https://acme.com/api): Full endpoint catalog with code samples
- [SDKs](https://acme.com/docs/sdks): Official libraries in 8 languages

## Customers

- [Case studies](https://acme.com/customers): Real-world deployments
- [Reviews](https://acme.com/reviews): G2 and TrustRadius coverage

## Optional

- [Changelog](https://acme.com/changelog): Version history (skippable)
- [Blog](https://acme.com/blog): Engineering posts
- [Press](https://acme.com/press): Coverage and announcements

That's under 1KB and hits every spec recommendation. You could ship this today.

5. The llms.txt Validation Checklist

Our LLMs.txt Checker validates your file against the full spec. Here's everything it checks, roughly in order of impact:

#	What we check
1	File exists at `/llms.txt` (HTTP 200)
2	Has H1 title (spec-required)
3	Has at least one H2 + markdown link list
4	All sampled URLs return 200 (up to 25 link probe)
5	Has blockquote summary directly after H1
6	Served as `text/markdown` or `text/plain`
7	No duplicate links
8	Markdown link format (no bare URLs / HTML anchors / generic text)
9	No auth-walled links (no 401/403 or login redirects)
10	Healthy file size (under 500KB)
11	Linked URLs not blocked by robots.txt
12	Site's robots.txt allows AI bots (GPTBot, ClaudeBot, etc.)
13	Bonus: `/llms-full.txt` companion exists
14	Bonus: HTML discovery tag (`<link rel="llms">`)

The checks toward the top matter most. A missing H1 is the single biggest problem — the spec literally calls it the only required element. Broken links hurt almost as much, because they actively degrade the AI's answer quality.

A note on cross-checks

The robots.txt cross-checks compare your llms.txt against your robots.txt. When your robots.txt is unavailable, the checker skips these rather than penalizing you — so your score still represents your llms.txt quality, not your infrastructure availability.

6. How AI Agents Actually Use llms.txt

The spec describes what the file is. It doesn't describe how AI agents consume it. That part is implementation-defined per agent, but the pattern is converging. Here's the typical flow:

Inference-time loading

When a user query mentions or implies your domain, the AI agent:

Resolves the domain (e.g., extracts "stripe.com" from the query).
Issues a GET request to https://stripe.com/llms.txt.
If 200 OK with markdown content-type → parses H1, blockquote, H2 sections.
Identifies which sections look most relevant to the query.
Fetches 2-5 URLs from those sections.
Synthesizes an answer using that content.

The whole flow takes 1-3 seconds and happens before the AI generates the answer the user sees. If the agent can't find or parse your llms.txt — file missing, server returns HTML, content-type wrong — it falls back to web search. You lose the curated path entirely.

Context window economics

AI agents have finite context windows. Loading your llms.txt costs them some of that budget. A 50KB llms.txt is roughly 12,000 tokens — significant but acceptable. A 500KB llms.txt is 120,000 tokens — most agents will refuse to load it or truncate aggressively.

That's why curation matters. Every link you list takes context budget. Listing 200 mediocre links is worse than listing 20 critical ones — the AI has to spend cycles deciding which to fetch, and may run out of budget before getting to the page that actually answers the query.

The Optional section as a release valve

The spec includes a special ## Optional section name with a specific semantic: AI agents can skip these URLs when context is tight. Put low-priority content here — changelog, press coverage, marketing pages — so the AI knows to drop them first under budget pressure.

7. The robots.txt Trap (and How to Avoid It)

The single biggest mistake we see in production: publishing llms.txt while blocking AI bots in robots.txt. This is self-defeating, and it's shockingly common.

Here's the broken setup:

# robots.txt — BROKEN if you also publish llms.txt

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

Why this breaks: having an llms.txt file is an active invitation to AI agents. Blocking those same agents in robots.txt is telling them to go away. Well-behaved AI agents check both files. If robots.txt disallows them, they skip your entire site — including the llms.txt you so carefully curated.

The fix is straightforward — explicitly allow the AI bots you want to read llms.txt:

# robots.txt — works with llms.txt

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Our checker flags this contradiction so you can fix it. Easy fix, big upside.

8. Common Mistakes and How to Fix Them

Here's what we see most often across the sites we check — ranked by frequency.

Mistake 1: Server returns HTML instead of markdown

You publish an llms.txt, but when you visit yourdomain.com/llms.txt in the browser, you see your homepage. That means a SPA route or custom 404 page is catching the request. AI agents expect markdown — they bail when they see HTML.

Fix: configure your server (Next.js public folder, Vercel static routing, Cloudflare Pages, whatever) to serve /llms.txt as a static file with Content-Type: text/markdown.

Mistake 2: Missing H1

The spec's only required element. We've seen llms.txt files that start with ## Documentation directly. Without an H1, AI agents may skip the file entirely or apply a generic project name from the URL.

Fix: start the file with # Your Site Name on the first non-blank line.

Mistake 3: Bare URLs instead of markdown links

Authors paste URLs as-is, like - https://example.com/docs. This works technically, but wastes the description slot. AI agents pull the link text as context — bare URLs give them nothing extra.

Fix: use the [Title](url): description format every time.

Mistake 4: HTML anchor tags

Often a result of copy-paste from a rendered web page. Something like <a href="...">Quickstart</a>. Markdown parsers may or may not extract these correctly.

Fix: convert to markdown link syntax. If your CMS exports HTML, run it through a markdown converter before publishing.

Mistake 5: Generic anchor text ("Click here")

- [Click here](https://example.com/pricing): Pricing page — the description redeems it, but the link text tells the AI nothing. Some agents may de-prioritize generic-text links when picking which to fetch.

Fix: descriptive link text that stands alone. [Pricing plans](https://example.com/pricing).

Mistake 6: Auth-walled or paywalled links

Listing a URL that returns 401 or redirects to a login page is pointing AI at a dead end. Anonymous AI agents can't authenticate. They get a login screen and abandon the link.

Fix: either expose the content publicly or remove it from llms.txt. If you have to gate it, link to the public marketing page that previews the gated content instead.

Mistake 7: Links that conflict with your own robots.txt

Your llms.txt lists https://example.com/internal/api, but your robots.txt has Disallow: /internal/. Well-behaved AI agents follow robots.txt and refuse to load. You've advertised a dead end.

Fix: cross-check your llms.txt URLs against your own robots.txt. Pick one rule or the other — don't contradict yourself.

Mistake 8: File too large (over 500KB)

A 1MB llms.txt is "technically valid" but defeats its own purpose. AI agents may truncate it, refuse to load it, or burn so much context window that there's nothing left for the actual page content.

Fix: split. Keep /llms.txt as a focused curated index (under 50KB). Move the bulk content to /llms-full.txt. AI agents with budget will grab the full file; agents with tight budget stick to the index.

Mistake 9: Stale content

You shipped an llms.txt in January 2025. Now it's May 2026, you've renamed three sections, killed a product line, and added a new feature. The llms.txt still points at the January URLs. AI agents follow stale links to dead pages, then fall back to web search anyway.

Fix: treat llms.txt like a sitemap. Regenerate quarterly, or whenever navigation changes.

Mistake 10: No /llms-full.txt companion

Not a critical mistake — it's a missed opportunity. AI agents with generous context budgets can answer better when they can grab full content in one shot. Shipping /llms-full.txt alongside the curated index gives you both surfaces.

Fix: most static site generators have plugins that auto-generate /llms-full.txt from your markdown content. Or write a small build script. Or use a tool like ours that auto-generates both.

9. Generating Yours with AI

Hand-curating an llms.txt for a small marketing site takes 30 minutes. For a 500-page documentation site, it's brutal — you're manually deciding which URLs make the cut, which section each one belongs in, what description gives the AI useful signal, and so on. That's why AI-powered generators exist.

Our LLMs.txt Checker includes a generator that runs in four steps:

Crawl — automatically discovers up to 50 of your most important pages, prioritized by homepage status, internal link count, and content freshness.
AI categorize — feeds the page list (URL + title + meta description) to an AI model. It groups pages into H2 sections and writes the H1 + blockquote summary.
Edit — returns an editable markdown textarea. You refine section names, tighten descriptions, move low-priority URLs into the Optional section.
Ship — copy or download the final file. Upload to your domain root as /llms.txt.

The generator uses 50 credits per run, refunded automatically if generation fails or returns malformed output. Worst-case cost is zero. Typical run time: 30-90 seconds.

A note on AI generator output

Always edit the AI output before publishing. The AI is good at structure and categorization — it's less good at brand voice, product nuance, and knowing which obscure page secretly drives 30% of your support tickets. Treat the output as a strong first draft, not a finished file.

10. Frequently Asked Questions

Will AI agents actually load my llms.txt, or is this all theoretical?

They actually load it. Inference-time loading by ChatGPT browse mode, Claude, Perplexity, and Gemini is now standard behavior when a query implies a specific domain. Adoption on the AI agent side is mature; the bottleneck is on the publisher side — most sites still haven't shipped one.

Should I publish llms.txt if my site is mostly marketing pages (no docs)?

Yes. Marketing sites benefit just as much — maybe more. When AI agents answer questions like "what does Acme do?" or "how does Acme's pricing compare?", you want the agent to load your framing, not a competitor's blog post or an old Reddit thread.

What about multi-language sites?

The spec doesn't define multi-language behavior. Two common patterns in production: one master llms.txt with cross-language sections (e.g., "Documentation (English)", "Documentation (Japanese)"), or per-locale subpaths like /en/llms.txt and /ja/llms.txt. The latter isn't spec-required and AI agents may not auto-discover the per-locale files — the master-file approach is safer.

Does llms.txt replace sitemap.xml?

No. They're different files for different audiences. sitemap.xml is the exhaustive URL list for traditional search engines (Google, Bing). llms.txt is the curated content index for AI agents. Ship both.

What if I have content I don't want AI to use for training?

llms.txt is for inference-time guidance — what the AI loads when answering a user query. Training-time opt-out is a separate problem. For that, use AI-specific User-agent Disallow rules in robots.txt (GPTBot, ClaudeBot training crawlers honor these) or the emerging /ai.txt standard. They're complementary, not alternatives.

Can a single llms.txt file have thousands of links?

Technically yes, practically no. Past about 200 links the file exceeds 50KB and you start losing AI agents that have tight context budgets. If you genuinely have thousands of URLs worth surfacing, curate the top ~50 into llms.txt and put the full list in llms-full.txt.

Is there a way to validate my llms.txt automatically?

Yes — our LLMs.txt Checker validates against the full llms.txt spec in about 30 seconds. Free.

Will llms.txt still matter in 2027?

Almost certainly. Either: (a) the spec stays the de facto standard and adoption keeps growing — which is the current trajectory; or (b) it gets superseded by a successor spec, in which case sites that shipped llms.txt will be in the same position (or better) when they migrate. Either path rewards shipping now.

What's the difference between ai.txt, llms.txt, and robots.txt?

Three files, three purposes. robots.txt: traditional crawler access control ("don't crawl this URL"). ai.txt: AI training opt-out (separate emerging spec). llms.txt: inference-time content curation for AI agents. They coexist. See our companion post llms.txt vs robots.txt for the full comparison.

Ship your llms.txt today

Validate yours against the full llms.txt spec in 30 seconds. Don't have one? Our generator crawls your site and produces a spec-conformant llms.txt in under 2 minutes.

Run the LLMs.txt Checker

llms.txt: The Complete 2026 SEO Guide for AI Search

May 26, 2026•14 min read•Technical SEO

TL;DR Summary

llms.txt is a markdown file at yourdomain.com/llms.txt that tells AI agents which pages matter most.
It was proposed by Jeremy Howard at Answer.AI in September 2024. Already adopted by Anthropic, Vercel, Cloudflare, Mintlify, Hugging Face.
The format is pure markdown: H1 (required) + blockquote summary + H2 sections of curated links.
Different from robots.txt: robots.txt gatekeeps access. llms.txt actively curates content for AI to load at inference time.
Pair with /llms-full.txt for AI agents that want full content depth in one fetch.
Keep it under 50KB ideally. Over 500KB defeats the curation purpose — use llms-full.txt for bulk.
Don't publish llms.txt while blocking AI bots in robots.txt — that's self-defeating.
Use our LLMs.txt Checker to validate yours against the full llms.txt spec. One-click generator included.

1. Why llms.txt Matters in 2026

For 25 years, SEO meant one thing: ranking on Google. You wrote content, you hoped Google would crawl it, and you measured success in SERP positions and organic clicks. That world is fading fast.

The compounding cost of not having one

Without llms.txt, an AI agent answering a question about your domain falls back to whatever Google or Bing happens to have indexed. Which is usually:

Outdated. Search engines might still surface a pricing page from 2024.
Noisy. The AI might cite your customer review section instead of your actual feature list.
Wrong. Without curation, AI agents have surfaced competitor mentions or blog comments as if they were your official position.
Missing the framing you wanted. You lose control of the first impression.

Shipping an llms.txt is a few hours of work. The asset compounds over the next 5+ years of AI search adoption. That's about as good a leverage ratio as you'll find in technical SEO.

2. The Complete llms.txt Spec

The full structure

# Site Name

> One-sentence summary — what the site is and who it's for.

Optional free-form prose between the blockquote and the first H2.

## Section Name

- [Link title](https://example.com/path): Optional description
- [Another link](https://example.com/other)

## Another Section

- [Page title](https://example.com/section/page)

## Optional

- [Lower-priority page](https://example.com/changelog): Skippable for tight context

What's required

The spec is almost aggressively minimal:

H1 with site or project name — this is the only required element of the entire spec. Everything else is optional, but recommended.

What's strongly recommended

Blockquote summary (>) directly after the H1. Gives the AI one sentence of project framing before scanning the link list. Without it, the AI infers context from URLs — which is unreliable.
One or more H2 sections with markdown link lists. Without these, the file is just a title — provides zero navigation value to AI agents.
An H2 literally named ## Optional — content AI agents can skip when context is tight. Spec-defined.

File location rules

HTTPS root only: https://yourdomain.com/llms.txt
Subdirectory paths like /docs/llms.txt are not part of the spec and will not be discovered
Must respond with HTTP 200
Served with Content-Type: text/markdown or text/plain
Never text/html — that signals to AI agents that you mis-routed the request

Link format rules

Inside an H2 section, every link must be a markdown list item:

# Good
- [Quickstart](https://example.com/docs/start): Five-minute setup

# Also good (no description)
- [API Reference](https://example.com/api)

# Bad — bare URL
- https://example.com/docs/start

# Bad — HTML anchor
- <a href="https://example.com/docs/start">Quickstart</a>

# Bad — generic anchor text
- [Click here](https://example.com/docs/start): Quickstart guide

Important note on absolute URLs

The spec doesn't mandate absolute vs relative URLs, but every production adopter uses absolute URLs. Absolute survives non-root deployments, partial fetches, and quoting in AI responses. Use them.

3. llms.txt vs llms-full.txt

Reading the spec, you'll see references to a companion file: /llms-full.txt. The two files solve different problems for AI agents with different context budgets.

	llms.txt	llms-full.txt
Purpose	Curated index	Full content dump
Format	Markdown table of contents with links	Inlined markdown of every page
Target size	Under 50KB ideal	Can be 1-10 MB
When AI agents use it	Tight context budget; need navigation	Generous budget; need depth
Ship both?	Yes — that's the Anthropic / Mintlify pattern

Think of it like this: llms.txt is the table of contents. llms-full.txt is the whole book inlined. AI agents grab whichever fits their working memory.

Our checker rewards shipping a /llms-full.txt companion. It's not required by the spec, but it's how the most adoption-forward sites ship.

4. Real-World Examples from Production Sites

Looking at what production sites are actually shipping is more useful than theorizing. Here are the patterns from major adopters as of May 2026.

Anthropic (docs.anthropic.com/llms.txt)

Vercel (vercel.com/llms.txt)

Cloudflare (per-product files)

Mintlify (auto-generated for every customer)

A minimal example for a typical SaaS site

You don't need 17 sections to ship a useful llms.txt. Here's a minimal one that covers the bases for a typical SaaS:

# Acme Analytics

> Acme is real-time product analytics for engineering teams. Connects to your warehouse in 5 minutes.

## Product

- [Features](https://acme.com/features): Real-time dashboards, alerts, audit logs
- [Pricing](https://acme.com/pricing): Free / Pro / Enterprise pricing tiers
- [Integrations](https://acme.com/integrations): Snowflake, BigQuery, Postgres, MySQL

## Documentation

- [Quickstart](https://acme.com/docs/quickstart): Five-minute setup from npm install to first event
- [API Reference](https://acme.com/api): Full endpoint catalog with code samples
- [SDKs](https://acme.com/docs/sdks): Official libraries in 8 languages

## Customers

- [Case studies](https://acme.com/customers): Real-world deployments
- [Reviews](https://acme.com/reviews): G2 and TrustRadius coverage

## Optional

- [Changelog](https://acme.com/changelog): Version history (skippable)
- [Blog](https://acme.com/blog): Engineering posts
- [Press](https://acme.com/press): Coverage and announcements

That's under 1KB and hits every spec recommendation. You could ship this today.

5. The llms.txt Validation Checklist

Our LLMs.txt Checker validates your file against the full spec. Here's everything it checks, roughly in order of impact:

#	What we check
1	File exists at `/llms.txt` (HTTP 200)
2	Has H1 title (spec-required)
3	Has at least one H2 + markdown link list
4	All sampled URLs return 200 (up to 25 link probe)
5	Has blockquote summary directly after H1
6	Served as `text/markdown` or `text/plain`
7	No duplicate links
8	Markdown link format (no bare URLs / HTML anchors / generic text)
9	No auth-walled links (no 401/403 or login redirects)
10	Healthy file size (under 500KB)
11	Linked URLs not blocked by robots.txt
12	Site's robots.txt allows AI bots (GPTBot, ClaudeBot, etc.)
13	Bonus: `/llms-full.txt` companion exists
14	Bonus: HTML discovery tag (`<link rel="llms">`)

A note on cross-checks

6. How AI Agents Actually Use llms.txt

The spec describes what the file is. It doesn't describe how AI agents consume it. That part is implementation-defined per agent, but the pattern is converging. Here's the typical flow:

Inference-time loading

When a user query mentions or implies your domain, the AI agent:

Resolves the domain (e.g., extracts "stripe.com" from the query).
Issues a GET request to https://stripe.com/llms.txt.
If 200 OK with markdown content-type → parses H1, blockquote, H2 sections.
Identifies which sections look most relevant to the query.
Fetches 2-5 URLs from those sections.
Synthesizes an answer using that content.

Context window economics

The Optional section as a release valve

7. The robots.txt Trap (and How to Avoid It)

The single biggest mistake we see in production: publishing llms.txt while blocking AI bots in robots.txt. This is self-defeating, and it's shockingly common.

Here's the broken setup:

# robots.txt — BROKEN if you also publish llms.txt

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Google-Extended
Disallow: /

The fix is straightforward — explicitly allow the AI bots you want to read llms.txt:

# robots.txt — works with llms.txt

User-agent: *
Disallow: /admin/
Disallow: /private/

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://example.com/sitemap.xml

Our checker flags this contradiction so you can fix it. Easy fix, big upside.

8. Common Mistakes and How to Fix Them

Here's what we see most often across the sites we check — ranked by frequency.

Mistake 1: Server returns HTML instead of markdown

Fix: configure your server (Next.js public folder, Vercel static routing, Cloudflare Pages, whatever) to serve /llms.txt as a static file with Content-Type: text/markdown.

Mistake 2: Missing H1

Fix: start the file with # Your Site Name on the first non-blank line.

Mistake 3: Bare URLs instead of markdown links

Authors paste URLs as-is, like - https://example.com/docs. This works technically, but wastes the description slot. AI agents pull the link text as context — bare URLs give them nothing extra.

Fix: use the [Title](url): description format every time.

Mistake 4: HTML anchor tags

Often a result of copy-paste from a rendered web page. Something like <a href="...">Quickstart</a>. Markdown parsers may or may not extract these correctly.

Fix: convert to markdown link syntax. If your CMS exports HTML, run it through a markdown converter before publishing.

Mistake 5: Generic anchor text ("Click here")

Fix: descriptive link text that stands alone. [Pricing plans](https://example.com/pricing).

Mistake 6: Auth-walled or paywalled links

Listing a URL that returns 401 or redirects to a login page is pointing AI at a dead end. Anonymous AI agents can't authenticate. They get a login screen and abandon the link.

Fix: either expose the content publicly or remove it from llms.txt. If you have to gate it, link to the public marketing page that previews the gated content instead.

Mistake 7: Links that conflict with your own robots.txt

Your llms.txt lists https://example.com/internal/api, but your robots.txt has Disallow: /internal/. Well-behaved AI agents follow robots.txt and refuse to load. You've advertised a dead end.

Fix: cross-check your llms.txt URLs against your own robots.txt. Pick one rule or the other — don't contradict yourself.

Mistake 8: File too large (over 500KB)

A 1MB llms.txt is "technically valid" but defeats its own purpose. AI agents may truncate it, refuse to load it, or burn so much context window that there's nothing left for the actual page content.

Mistake 9: Stale content

Fix: treat llms.txt like a sitemap. Regenerate quarterly, or whenever navigation changes.

Mistake 10: No /llms-full.txt companion

Fix: most static site generators have plugins that auto-generate /llms-full.txt from your markdown content. Or write a small build script. Or use a tool like ours that auto-generates both.

9. Generating Yours with AI

Our LLMs.txt Checker includes a generator that runs in four steps:

Crawl — automatically discovers up to 50 of your most important pages, prioritized by homepage status, internal link count, and content freshness.
AI categorize — feeds the page list (URL + title + meta description) to an AI model. It groups pages into H2 sections and writes the H1 + blockquote summary.
Edit — returns an editable markdown textarea. You refine section names, tighten descriptions, move low-priority URLs into the Optional section.
Ship — copy or download the final file. Upload to your domain root as /llms.txt.

The generator uses 50 credits per run, refunded automatically if generation fails or returns malformed output. Worst-case cost is zero. Typical run time: 30-90 seconds.

A note on AI generator output

10. Frequently Asked Questions

Will AI agents actually load my llms.txt, or is this all theoretical?

Should I publish llms.txt if my site is mostly marketing pages (no docs)?

What about multi-language sites?

Does llms.txt replace sitemap.xml?

What if I have content I don't want AI to use for training?

Can a single llms.txt file have thousands of links?

Is there a way to validate my llms.txt automatically?

Yes — our LLMs.txt Checker validates against the full llms.txt spec in about 30 seconds. Free.

Will llms.txt still matter in 2027?

What's the difference between ai.txt, llms.txt, and robots.txt?

Ship your llms.txt today

Validate yours against the full llms.txt spec in 30 seconds. Don't have one? Our generator crawls your site and produces a spec-conformant llms.txt in under 2 minutes.

Run the LLMs.txt Checker

TL;DR Summary

1. Why llms.txt Matters in 2026

The compounding cost of not having one

2. The Complete llms.txt Spec

The full structure

What's required

What's strongly recommended

File location rules

Link format rules

Important note on absolute URLs

3. llms.txt vs llms-full.txt

4. Real-World Examples from Production Sites

Anthropic (docs.anthropic.com/llms.txt)

Vercel (vercel.com/llms.txt)

Cloudflare (per-product files)

Mintlify (auto-generated for every customer)

A minimal example for a typical SaaS site

5. The llms.txt Validation Checklist

A note on cross-checks

6. How AI Agents Actually Use llms.txt

Inference-time loading

Context window economics

The Optional section as a release valve

7. The robots.txt Trap (and How to Avoid It)

8. Common Mistakes and How to Fix Them

Mistake 1: Server returns HTML instead of markdown

Mistake 2: Missing H1

Mistake 3: Bare URLs instead of markdown links

Mistake 4: HTML anchor tags

Mistake 5: Generic anchor text ("Click here")

Mistake 6: Auth-walled or paywalled links

Mistake 7: Links that conflict with your own robots.txt

Mistake 8: File too large (over 500KB)

Mistake 9: Stale content

Mistake 10: No /llms-full.txt companion

9. Generating Yours with AI

A note on AI generator output

10. Frequently Asked Questions

Will AI agents actually load my llms.txt, or is this all theoretical?

Should I publish llms.txt if my site is mostly marketing pages (no docs)?

What about multi-language sites?

Does llms.txt replace sitemap.xml?

What if I have content I don't want AI to use for training?

Can a single llms.txt file have thousands of links?

Is there a way to validate my llms.txt automatically?

Will llms.txt still matter in 2027?

What's the difference between ai.txt, llms.txt, and robots.txt?

Ship your llms.txt today

Related guides

TL;DR Summary

1. Why llms.txt Matters in 2026

The compounding cost of not having one

2. The Complete llms.txt Spec

The full structure

What's required

What's strongly recommended

File location rules

Link format rules

Important note on absolute URLs

3. llms.txt vs llms-full.txt

4. Real-World Examples from Production Sites

Anthropic (docs.anthropic.com/llms.txt)

Vercel (vercel.com/llms.txt)

Cloudflare (per-product files)

Mintlify (auto-generated for every customer)

A minimal example for a typical SaaS site

5. The llms.txt Validation Checklist

A note on cross-checks

6. How AI Agents Actually Use llms.txt

Inference-time loading

Context window economics

The Optional section as a release valve

7. The robots.txt Trap (and How to Avoid It)

8. Common Mistakes and How to Fix Them

Mistake 1: Server returns HTML instead of markdown

Mistake 2: Missing H1

Mistake 3: Bare URLs instead of markdown links

Mistake 4: HTML anchor tags

Mistake 5: Generic anchor text ("Click here")

Mistake 6: Auth-walled or paywalled links