Can I generate llms.txt automatically?

Yes. AI-powered generators crawl your site, pick the most important pages, and produce a spec-conformant llms.txt with curated H2 sections, blockquote summary, and proper markdown links. The InstaRank SEO LLMs.txt Generator produces it automatically — you review and edit before publishing.

How long does AI generation take?

30 to 90 seconds typically. The crawl phase is the slowest part (10-60 seconds depending on site speed). The generation itself completes in 5-15 seconds.

Do I need to edit AI-generated llms.txt before publishing?

Yes, always edit before publishing. The AI is excellent at structure and categorization but less good at brand voice, product nuance, and knowing which obscure page secretly drives 30% of your support tickets. Treat the AI output as a strong first draft, not a finished file.

What pages does the generator crawl?

We prioritize your homepage first, then your most internally-linked pages — the ones most worth surfacing to AI. Only live pages (HTTP 200) with a title or H1 are included, so the result is consistent run to run.

What if the AI invents URLs that don't exist on my site?

The generator validates every AI-suggested URL against your real crawled pages. If the AI invents URLs, they are automatically removed before the file is returned, so you never ship a broken link.

How often should I regenerate llms.txt?

Treat it like a sitemap. Regenerate when major content launches, when navigation changes, when our checker shows broken-link issues, and quarterly at minimum.

Can I generate llms.txt for a competitor's site?

You can generate against any domain you have authorization to scan. The generator crawls public pages only. Whether to deploy a competitor's generated llms.txt on your own site is a question you should answer carefully — that file shapes how AI describes you , not them.

How to Create a Perfect llms.txt File (AI-Powered, 2026)

Q: How much does AI-generated llms.txt cost?

Our generator costs 50 credits per run, which covers the crawl plus one generation pass. Credits are refunded automatically if generation fails or returns malformed output, so the worst-case cost is zero.

May 26, 2026•10 min read•Technical SEO

Hand-curating an llms.txt for a 500-page documentation site is brutal. You're manually deciding which URLs make the cut, which H2 section each one belongs in, what one-line description gives the AI useful signal. For most teams, that's a half-day of work spread across product, marketing, and dev — and the file gets stale within weeks. There's a faster way: use AI to generate it. This guide walks through the exact 4-step workflow we use, with real prompt examples and the full validation checklist.

TL;DR Summary

4 steps: crawl, AI categorize, edit, ship. Total time: 5-10 minutes.
Crawl pulls up to 50 pages, prioritized by homepage status + internal link count.
AI groups pages into H2 sections, writes the H1 + blockquote summary, and outputs editable markdown.
Always edit before publishing — the AI is good at structure, less good at brand voice.
Our generator: 50 credits per run, refunded automatically on failure.
After shipping, validate with the LLMs.txt Checker against the full llms.txt spec.
Regenerate quarterly or when navigation changes.

1. Why Use AI to Generate llms.txt

The honest answer: hand-curation is fine for small sites. If you have 20 marketing pages and one product, you can write a great llms.txt over coffee. The problem is what happens beyond that.

The hand-curation tax

Once your site has 100+ pages, hand-curation gets painful:

You have to review every page to decide if it's worth surfacing to AI
You have to group pages into logical H2 sections — what counts as "Documentation" vs "Guides" vs "Reference"?
You have to write a one-line description for each page that's useful to an AI
You have to decide what goes in the Optional section
You have to do this again every time the site changes

For a 500-page docs site, that's 4-8 hours of focused work. Most teams don't have that time, so they either ship a bad llms.txt (3 H2 sections, 12 links, no descriptions) or ship none at all.

What AI does well

Large language models are genuinely good at:

Page categorization: looking at 50 URLs + titles + descriptions, picking sensible H2 groupings, deciding what belongs where
Concise descriptions: turning a 200-word meta description into a 12-word link description
Structural consistency: keeping every H2 section similarly sized, every link in the same format
Identifying Optional content: changelog, press, legal — the AI knows what humans typically skip

These are exactly the parts that take humans the longest. Delegating them to AI cuts the work from hours to minutes.

What AI does badly

AI is less good at:

Brand voice: the blockquote summary the AI writes will be technically accurate but generic
Product nuance: the AI doesn't know that your "Webhooks" section is critical to enterprise customers but irrelevant to free-tier users
Tribal knowledge: which obscure page secretly answers 30% of your support tickets? The AI can't see that signal
Strategic curation: the AI doesn't know which pages you're actively de-emphasizing for business reasons

That's why every workflow ends with a human editing pass. AI handles the volume; you handle the judgment.

2. The 4-Step AI Workflow

Every llms.txt generator follows roughly the same flow. Here's the version we ship in the InstaRank SEO LLMs.txt Generator:

Crawl — discover the URLs that matter (10-60 seconds)
AI categorize — group + describe + structure (5-15 seconds)
Edit — human refinement (2-5 minutes)
Ship — upload to production (1 minute)

Total elapsed time: under 10 minutes for the first run. Subsequent regenerations are faster because you already know the editing patterns.

3. Step 1: Crawl Your Site

The generator can't recommend pages it doesn't know exist. So step 1 is discovery. The crawl phase has three jobs: find candidate URLs, fetch their titles + meta descriptions, and prioritize the list.

What the crawler pulls

For each candidate page:

URL — the absolute URL, with protocol and path
Title — from the HTML <title> tag, fallback to H1
Meta description — from <meta name="description">
Status code — only 2xx pages are kept
Internal link count — used as a popularity proxy

Page prioritization

A good generator caps the input at around 50 pages — the sweet spot for an AI's context budget. To pick the most important pages from a larger site, prioritize in this order:

Homepage first — always. It sets the project framing.
Then by internal link count — descending. Pages with the most inbound internal links are usually the most important ones.
Then alphabetical — as a stable tiebreaker.

Consistent ordering matters: running the generator twice on the same site should produce the same prioritized page list, so your llms.txt is reproducible rather than random.

A note on JavaScript-rendered sites

The crawl reads your server-rendered HTML for speed. If your site is fully SPA-rendered and the meta descriptions only appear after JS executes, the crawl will surface URLs but with thin metadata. Workaround: ensure your SPA outputs title + meta description in the initial HTML server-side (Next.js metadata API, Nuxt useHead, etc.). That's also better for traditional SEO.

4. Step 2: AI Categorization

Once we have the page list, we feed it to an AI model that groups your pages into H2 sections and writes the H1 + blockquote summary.

The AI prompt structure

The system prompt establishes the spec:

You are an expert SEO and AI optimization engineer producing
a spec-conformant /llms.txt file for the site {SITE_NAME}
({DOMAIN}).

The /llms.txt standard requires this exact markdown structure:

# <Site / Project Name>

> <One-sentence summary>

## <Section Name>

- [Page Title](url): Optional description
- [Another Page](url)

## Optional

- [Lower-priority page](url): Skippable section

Rules:
1. H1 is required (site/project name).
2. Blockquote (>) directly after H1.
3. H2 sections group pages by purpose.
4. Every list item MUST be markdown link format.
5. Use absolute URLs only.
6. Include "## Optional" section.
7. Output PURE markdown only — no code fences.

The user prompt provides the crawled page data:

Generate the /llms.txt file for {SITE_NAME} ({DOMAIN}).

Input pages:
1. URL: https://example.com/
   Title: Acme Analytics — Real-time product analytics
   Description: Connect to your warehouse in 5 minutes...

2. URL: https://example.com/docs/quickstart
   Title: Quickstart Guide
   Description: Five-minute setup from npm install...

[...up to 50 pages...]

Output the complete markdown content of /llms.txt only.

Generation settings that matter

If you build your own generator, a few model settings make a real difference:

A generous output limit (~16K tokens) — supports up to ~64KB output, enough for the largest reasonable llms.txt
A low temperature (~0.4) — low enough for consistency, high enough for natural prose in descriptions
A generous timeout (~90 seconds) — a typical run finishes in 5-15 seconds, but headroom avoids truncation on large sites

Output validation

AI output is validated against the llms.txt spec. We detect two failure modes:

Truncation: the AI hit its token limit mid-section. We detect this by looking for dangling list markers (- [ with no closing) or an empty trailing H2 section. Retry once with reduced section count.
Hallucination: the AI invented URLs that weren't in the input. We compare every URL in the output against the input allowlist. Retry once with an explicit allowlist constraint in the prompt.

If retries don't fix it, the generator strips any remaining invalid lines, returns the validated result, flags it as partial, and refunds the credits.

5. Step 3: Edit the Draft

The generator returns an editable markdown textarea. Always edit before publishing. Here's what to look at.

H1 wording

The AI extracts your H1 from the homepage title. That's usually right, but sometimes the title is stripped ("Acme | Real-time analytics" becomes just "Acme") when you wanted the full thing. Make sure the H1 reads like your actual brand, not a stripped title tag.

Blockquote summary

The AI writes a generic one-sentence summary. Tighten it. The blockquote is the first thing AI agents see — it sets the framing for everything below. A good summary tells the AI what your product does and who it's for in one sentence.

Bad: > Acme is a software company providing solutions for businesses. (generic, says nothing)

Good: > Acme is real-time product analytics for engineering teams — connects to your warehouse in 5 minutes, sub-second query performance. (specific, named audience, one differentiator)

Section names

The AI picks sensible H2 names, but you know your users better. Match how your users would phrase the question. If your customers ask "how much does it cost", the section should be ## Pricing, not ## Plans Overview.

Link descriptions

The AI uses your meta descriptions as link descriptions. That's often fine, but meta descriptions are written for Google SERPs — they're sometimes too markety. Rewrite any link description that reads like ad copy.

Bad: [API Reference](url): The world's most powerful analytics API with unmatched scalability

Good: [API Reference](url): Full endpoint catalog with code samples in 8 languages

Optional section curation

The AI's default Optional section is usually too small. Move more content there. Things that belong in Optional:

Changelog
Press coverage
Press releases
About / Team / Careers
Legal / Privacy / Terms
Old or deprecated content you can't remove yet
Marketing pages (sometimes — depends on your strategy)

The Optional section lets AI agents drop those URLs first when context budget is tight, preserving the budget for what actually matters.

Size check

Keep total file size under 50KB. If you're over, either trim links or move bulk content to /llms-full.txt. The generator displays the current size in the editor.

6. Step 4: Ship to Production

Once the file is edited, you ship it. Three common deployment patterns:

Static hosting (Vercel, Netlify, Cloudflare Pages)

Drop the file in your static-asset directory:

# Next.js
public/llms.txt

# Vite / Vue
public/llms.txt

# Nuxt
public/llms.txt

# Astro
public/llms.txt

# Hugo
static/llms.txt

Most static hosts serve .txt files with Content-Type: text/plain by default — that's fine. Some hosts let you override per-file Content-Type via a config file (e.g., _headers on Cloudflare Pages, vercel.json on Vercel) if you want text/markdown specifically.

Next.js route handler

For Next.js, you can also serve it as a dynamic route handler if you want to generate the file on demand:

// app/llms.txt/route.ts
import { NextResponse } from 'next/server';

export async function GET() {
  const content = `# Acme Analytics

> Real-time product analytics for engineering teams.

## Product

- [Features](https://acme.com/features): Real-time dashboards
- [Pricing](https://acme.com/pricing): Free / Pro / Enterprise

## Documentation

- [Quickstart](https://acme.com/docs): Five-minute setup
`;

  return new NextResponse(content, {
    headers: { 'Content-Type': 'text/markdown; charset=utf-8' },
  });
}

Custom server

Nginx, Apache, or any custom server — configure to serve /llms.txt with the right Content-Type. Example Nginx config:

location = /llms.txt {
    alias /var/www/yoursite/llms.txt;
    add_header Content-Type text/markdown;
    add_header Cache-Control "public, max-age=3600";
}

Verify after deploy

After deployment, hit the URL with curl:

curl -I https://yourdomain.com/llms.txt
# Want to see:
# HTTP/2 200
# Content-Type: text/markdown; charset=utf-8

curl https://yourdomain.com/llms.txt | head -5
# Want to see your H1 + blockquote on top

7. The Validation Checklist

Before declaring victory, run through the full validation list. Easiest way: paste your domain into our LLMs.txt Checker. The checker runs through the full list in about 30 seconds. Doing it manually? Here's the list:

File exists at /llms.txt — HTTP 200, HTTPS, root path
Has H1 title — first non-blank line starts with #
Has H2 sections with markdown links — at least one ## Section followed by a markdown list of links
All sampled URLs return 200 — spot check 5-10 links manually, or use our checker for a 25-link sample
Has blockquote summary — > line directly after the H1 (blank lines between are okay)
Correct Content-Type — text/markdown or text/plain, never text/html
No duplicate links — every URL appears at most once
Markdown link format only — no bare URLs, no HTML <a> tags, no generic anchor text ("click here", "read more")
No auth-walled or paywalled links — every URL is publicly accessible
Healthy file size — under 50KB ideal, under 150KB OK, over 500KB warning
Linked URLs not blocked by robots.txt — cross-check against your own robots.txt
Site's robots.txt allows AI bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended must be Allowed
Bonus: /llms-full.txt companion exists — paired full-content file
Bonus: HTML discovery tag — <link rel="llms" href="/llms.txt"> in homepage <head>

8. Prompt Engineering for Custom Generators

If you're building your own llms.txt generator (e.g., for a specific platform or with a different AI provider), here are the prompt-engineering lessons we've learned the hard way.

Be explicit about the spec

Don't assume the model knows the llms.txt spec. Include the full structure example in the system prompt every time. Models trained before September 2024 may not have seen the spec at all.

Forbid code fences explicitly

Models love wrapping output in ```markdown ... ``` fences. The output needs to be pure markdown — no fences. Add "Output PURE markdown — no code fences, no commentary, no preamble." to your system prompt, and strip any leading/trailing fences in post-processing as belt-and-suspenders.

Constrain to allowlisted URLs

Pass the URL list in the prompt and explicitly say "Only use URLs from the provided list. Do NOT invent URLs.". Then validate output against the allowlist programmatically — don't trust the constraint to hold every time.

Set output size limits

For a 50-page input, set a generous output limit (~16K tokens or more). Truncation mid-section is far more annoying than hitting the limit cleanly.

Two-pass for quality

For higher-quality output, do two passes: first pass generates structure + sections + link descriptions. Second pass rewrites the blockquote and refines descriptions. More expensive (~2x tokens) but the output is noticeably better. Our public generator runs single-pass for cost reasons; internal tools can splurge.

9. When to Regenerate

llms.txt isn't a ship-it-and-forget-it file. AI agents reward freshness. Regenerate when:

Major content launches — new product area, doc rewrite, new section
Navigation changes — URL restructure, renamed sections, deprecated routes
Quarterly cadence — even if nothing major changed; AI agents reward updated Last-Modified headers
Broken link issues — when our checker shows sampled URLs returning 404, regenerate to pick up the current URL set
Brand voice changes — new positioning means re-writing the blockquote

Most sites under-update. A regeneration takes 5 minutes — easier than waiting for someone to complain.

10. Frequently Asked Questions

How much does AI-generated llms.txt cost?

50 credits per run on InstaRank SEO. Credits are refunded automatically if generation fails or returns malformed output, so worst-case cost is zero. Other vendors price similarly — running it yourself against a frontier-model API is roughly $0.05-0.20 per generation.

Which AI model is best for this?

Any frontier model that's good at structured markdown output works well here. Honestly, the prompt matters more than the model — a clear spec and a strict allowlist beat raw model horsepower.

Can the AI hallucinate URLs that don't exist?

Yes, occasionally. We detect hallucinations by validating every output URL against the input allowlist. On detection, we retry once with explicit constraints. If hallucinations persist, we strip them before returning the file.

What if my site has 5000 pages?

The generator caps at 50 pages because that's the AI context sweet spot. For very large sites, the right move is to pick your top 30-50 manually, then use the generator on that curated set. Or ship a tighter llms.txt + a more exhaustive llms-full.txt.

Can I use the generator for a competitor's domain?

You can scan any domain you have authorization to crawl. Whether you should deploy a competitor's generated llms.txt on your own site is a different question — that file is meant to shape how AI describes you, not them. The legitimate use case is competitor research: see what their llms.txt looks like for your own benchmarking.

Does this work for non-English sites?

Yes. Frontier LLMs handle most major languages well. The structure (H1, blockquote, H2 sections) is language-agnostic. The descriptions will be in whatever language the source pages use.

How do I A/B test different llms.txt versions?

You can't directly — there's only one /llms.txt path. Indirect approach: change version A → wait 30 days → measure AI-referred traffic (UTM tags help) → switch to version B → measure. Slow, but the only signal worth tracking.

Should I include affiliate links or marketing CTAs?

No. AI agents will quote your llms.txt content verbatim in answers. You don't want a customer's AI assistant to surface "Buy Acme Pro now — 30% off this week!" mid-conversation. Keep llms.txt informational. Save the CTAs for the linked pages themselves.

Generate your llms.txt now

50 credits per run. Refunded automatically on failure. Under 2 minutes from URL to publishable markdown.

Open the LLMs.txt Generator

How to Create a Perfect llms.txt File (AI-Powered, 2026)

May 26, 2026•10 min read•Technical SEO

TL;DR Summary

4 steps: crawl, AI categorize, edit, ship. Total time: 5-10 minutes.
Crawl pulls up to 50 pages, prioritized by homepage status + internal link count.
AI groups pages into H2 sections, writes the H1 + blockquote summary, and outputs editable markdown.
Always edit before publishing — the AI is good at structure, less good at brand voice.
Our generator: 50 credits per run, refunded automatically on failure.
After shipping, validate with the LLMs.txt Checker against the full llms.txt spec.
Regenerate quarterly or when navigation changes.

1. Why Use AI to Generate llms.txt

The honest answer: hand-curation is fine for small sites. If you have 20 marketing pages and one product, you can write a great llms.txt over coffee. The problem is what happens beyond that.

The hand-curation tax

Once your site has 100+ pages, hand-curation gets painful:

You have to review every page to decide if it's worth surfacing to AI
You have to group pages into logical H2 sections — what counts as "Documentation" vs "Guides" vs "Reference"?
You have to write a one-line description for each page that's useful to an AI
You have to decide what goes in the Optional section
You have to do this again every time the site changes

For a 500-page docs site, that's 4-8 hours of focused work. Most teams don't have that time, so they either ship a bad llms.txt (3 H2 sections, 12 links, no descriptions) or ship none at all.

What AI does well

Large language models are genuinely good at:

Page categorization: looking at 50 URLs + titles + descriptions, picking sensible H2 groupings, deciding what belongs where
Concise descriptions: turning a 200-word meta description into a 12-word link description
Structural consistency: keeping every H2 section similarly sized, every link in the same format
Identifying Optional content: changelog, press, legal — the AI knows what humans typically skip

These are exactly the parts that take humans the longest. Delegating them to AI cuts the work from hours to minutes.

What AI does badly

AI is less good at:

Brand voice: the blockquote summary the AI writes will be technically accurate but generic
Product nuance: the AI doesn't know that your "Webhooks" section is critical to enterprise customers but irrelevant to free-tier users
Tribal knowledge: which obscure page secretly answers 30% of your support tickets? The AI can't see that signal
Strategic curation: the AI doesn't know which pages you're actively de-emphasizing for business reasons

That's why every workflow ends with a human editing pass. AI handles the volume; you handle the judgment.

2. The 4-Step AI Workflow

Every llms.txt generator follows roughly the same flow. Here's the version we ship in the InstaRank SEO LLMs.txt Generator:

Crawl — discover the URLs that matter (10-60 seconds)
AI categorize — group + describe + structure (5-15 seconds)
Edit — human refinement (2-5 minutes)
Ship — upload to production (1 minute)

Total elapsed time: under 10 minutes for the first run. Subsequent regenerations are faster because you already know the editing patterns.

3. Step 1: Crawl Your Site

What the crawler pulls

For each candidate page:

URL — the absolute URL, with protocol and path
Title — from the HTML <title> tag, fallback to H1
Meta description — from <meta name="description">
Status code — only 2xx pages are kept
Internal link count — used as a popularity proxy

Page prioritization

A good generator caps the input at around 50 pages — the sweet spot for an AI's context budget. To pick the most important pages from a larger site, prioritize in this order:

Homepage first — always. It sets the project framing.
Then by internal link count — descending. Pages with the most inbound internal links are usually the most important ones.
Then alphabetical — as a stable tiebreaker.

Consistent ordering matters: running the generator twice on the same site should produce the same prioritized page list, so your llms.txt is reproducible rather than random.

A note on JavaScript-rendered sites

4. Step 2: AI Categorization

Once we have the page list, we feed it to an AI model that groups your pages into H2 sections and writes the H1 + blockquote summary.

The AI prompt structure

The system prompt establishes the spec:

You are an expert SEO and AI optimization engineer producing
a spec-conformant /llms.txt file for the site {SITE_NAME}
({DOMAIN}).

The /llms.txt standard requires this exact markdown structure:

# <Site / Project Name>

> <One-sentence summary>

## <Section Name>

- [Page Title](url): Optional description
- [Another Page](url)

## Optional

- [Lower-priority page](url): Skippable section

Rules:
1. H1 is required (site/project name).
2. Blockquote (>) directly after H1.
3. H2 sections group pages by purpose.
4. Every list item MUST be markdown link format.
5. Use absolute URLs only.
6. Include "## Optional" section.
7. Output PURE markdown only — no code fences.

The user prompt provides the crawled page data:

Generate the /llms.txt file for {SITE_NAME} ({DOMAIN}).

Input pages:
1. URL: https://example.com/
   Title: Acme Analytics — Real-time product analytics
   Description: Connect to your warehouse in 5 minutes...

2. URL: https://example.com/docs/quickstart
   Title: Quickstart Guide
   Description: Five-minute setup from npm install...

[...up to 50 pages...]

Output the complete markdown content of /llms.txt only.

Generation settings that matter

If you build your own generator, a few model settings make a real difference:

A generous output limit (~16K tokens) — supports up to ~64KB output, enough for the largest reasonable llms.txt
A low temperature (~0.4) — low enough for consistency, high enough for natural prose in descriptions
A generous timeout (~90 seconds) — a typical run finishes in 5-15 seconds, but headroom avoids truncation on large sites

Output validation

AI output is validated against the llms.txt spec. We detect two failure modes:

Truncation: the AI hit its token limit mid-section. We detect this by looking for dangling list markers (- [ with no closing) or an empty trailing H2 section. Retry once with reduced section count.
Hallucination: the AI invented URLs that weren't in the input. We compare every URL in the output against the input allowlist. Retry once with an explicit allowlist constraint in the prompt.

If retries don't fix it, the generator strips any remaining invalid lines, returns the validated result, flags it as partial, and refunds the credits.

5. Step 3: Edit the Draft

The generator returns an editable markdown textarea. Always edit before publishing. Here's what to look at.

H1 wording

Blockquote summary

Bad: > Acme is a software company providing solutions for businesses. (generic, says nothing)

Good: > Acme is real-time product analytics for engineering teams — connects to your warehouse in 5 minutes, sub-second query performance. (specific, named audience, one differentiator)

Section names

Link descriptions

Bad: [API Reference](url): The world's most powerful analytics API with unmatched scalability

Good: [API Reference](url): Full endpoint catalog with code samples in 8 languages

Optional section curation

The AI's default Optional section is usually too small. Move more content there. Things that belong in Optional:

Changelog
Press coverage
Press releases
About / Team / Careers
Legal / Privacy / Terms
Old or deprecated content you can't remove yet
Marketing pages (sometimes — depends on your strategy)

The Optional section lets AI agents drop those URLs first when context budget is tight, preserving the budget for what actually matters.

Size check

Keep total file size under 50KB. If you're over, either trim links or move bulk content to /llms-full.txt. The generator displays the current size in the editor.

6. Step 4: Ship to Production

Once the file is edited, you ship it. Three common deployment patterns:

Static hosting (Vercel, Netlify, Cloudflare Pages)

Drop the file in your static-asset directory:

# Next.js
public/llms.txt

# Vite / Vue
public/llms.txt

# Nuxt
public/llms.txt

# Astro
public/llms.txt

# Hugo
static/llms.txt

Next.js route handler

For Next.js, you can also serve it as a dynamic route handler if you want to generate the file on demand:

// app/llms.txt/route.ts
import { NextResponse } from 'next/server';

export async function GET() {
  const content = `# Acme Analytics

> Real-time product analytics for engineering teams.

## Product

- [Features](https://acme.com/features): Real-time dashboards
- [Pricing](https://acme.com/pricing): Free / Pro / Enterprise

## Documentation

- [Quickstart](https://acme.com/docs): Five-minute setup
`;

  return new NextResponse(content, {
    headers: { 'Content-Type': 'text/markdown; charset=utf-8' },
  });
}

Custom server

Nginx, Apache, or any custom server — configure to serve /llms.txt with the right Content-Type. Example Nginx config:

location = /llms.txt {
    alias /var/www/yoursite/llms.txt;
    add_header Content-Type text/markdown;
    add_header Cache-Control "public, max-age=3600";
}

Verify after deploy

After deployment, hit the URL with curl:

curl -I https://yourdomain.com/llms.txt
# Want to see:
# HTTP/2 200
# Content-Type: text/markdown; charset=utf-8

curl https://yourdomain.com/llms.txt | head -5
# Want to see your H1 + blockquote on top

7. The Validation Checklist

File exists at /llms.txt — HTTP 200, HTTPS, root path
Has H1 title — first non-blank line starts with #
Has H2 sections with markdown links — at least one ## Section followed by a markdown list of links
All sampled URLs return 200 — spot check 5-10 links manually, or use our checker for a 25-link sample
Has blockquote summary — > line directly after the H1 (blank lines between are okay)
Correct Content-Type — text/markdown or text/plain, never text/html
No duplicate links — every URL appears at most once
Markdown link format only — no bare URLs, no HTML <a> tags, no generic anchor text ("click here", "read more")
No auth-walled or paywalled links — every URL is publicly accessible
Healthy file size — under 50KB ideal, under 150KB OK, over 500KB warning
Linked URLs not blocked by robots.txt — cross-check against your own robots.txt
Site's robots.txt allows AI bots — GPTBot, ClaudeBot, PerplexityBot, Google-Extended must be Allowed
Bonus: /llms-full.txt companion exists — paired full-content file
Bonus: HTML discovery tag — <link rel="llms" href="/llms.txt"> in homepage <head>

8. Prompt Engineering for Custom Generators

If you're building your own llms.txt generator (e.g., for a specific platform or with a different AI provider), here are the prompt-engineering lessons we've learned the hard way.

Be explicit about the spec

Don't assume the model knows the llms.txt spec. Include the full structure example in the system prompt every time. Models trained before September 2024 may not have seen the spec at all.

Forbid code fences explicitly

Constrain to allowlisted URLs

Set output size limits

For a 50-page input, set a generous output limit (~16K tokens or more). Truncation mid-section is far more annoying than hitting the limit cleanly.

Two-pass for quality

9. When to Regenerate

llms.txt isn't a ship-it-and-forget-it file. AI agents reward freshness. Regenerate when:

Major content launches — new product area, doc rewrite, new section
Navigation changes — URL restructure, renamed sections, deprecated routes
Quarterly cadence — even if nothing major changed; AI agents reward updated Last-Modified headers
Broken link issues — when our checker shows sampled URLs returning 404, regenerate to pick up the current URL set
Brand voice changes — new positioning means re-writing the blockquote

Most sites under-update. A regeneration takes 5 minutes — easier than waiting for someone to complain.

10. Frequently Asked Questions

How much does AI-generated llms.txt cost?

Which AI model is best for this?

Any frontier model that's good at structured markdown output works well here. Honestly, the prompt matters more than the model — a clear spec and a strict allowlist beat raw model horsepower.

Can the AI hallucinate URLs that don't exist?

What if my site has 5000 pages?

Can I use the generator for a competitor's domain?

Does this work for non-English sites?

Yes. Frontier LLMs handle most major languages well. The structure (H1, blockquote, H2 sections) is language-agnostic. The descriptions will be in whatever language the source pages use.

How do I A/B test different llms.txt versions?

Should I include affiliate links or marketing CTAs?

Generate your llms.txt now

50 credits per run. Refunded automatically on failure. Under 2 minutes from URL to publishable markdown.

Open the LLMs.txt Generator

TL;DR Summary

1. Why Use AI to Generate llms.txt

The hand-curation tax

What AI does well

What AI does badly

2. The 4-Step AI Workflow

3. Step 1: Crawl Your Site

What the crawler pulls

Page prioritization

A note on JavaScript-rendered sites

4. Step 2: AI Categorization

The AI prompt structure

Generation settings that matter

Output validation

5. Step 3: Edit the Draft

H1 wording

Blockquote summary

Section names

Link descriptions

Optional section curation

Size check

6. Step 4: Ship to Production

Static hosting (Vercel, Netlify, Cloudflare Pages)

Next.js route handler

Custom server

Verify after deploy

7. The Validation Checklist

8. Prompt Engineering for Custom Generators

Be explicit about the spec

Forbid code fences explicitly

Constrain to allowlisted URLs

Set output size limits

Two-pass for quality

9. When to Regenerate

10. Frequently Asked Questions

How much does AI-generated llms.txt cost?

Which AI model is best for this?

Can the AI hallucinate URLs that don't exist?

What if my site has 5000 pages?

Can I use the generator for a competitor's domain?

Does this work for non-English sites?

How do I A/B test different llms.txt versions?

Should I include affiliate links or marketing CTAs?

Generate your llms.txt now

Related guides

TL;DR Summary

1. Why Use AI to Generate llms.txt

The hand-curation tax

What AI does well

What AI does badly

2. The 4-Step AI Workflow

3. Step 1: Crawl Your Site

What the crawler pulls

Page prioritization

A note on JavaScript-rendered sites

4. Step 2: AI Categorization

The AI prompt structure

Generation settings that matter

Output validation

5. Step 3: Edit the Draft

H1 wording

Blockquote summary

Section names

Link descriptions

Optional section curation

Size check

6. Step 4: Ship to Production

Static hosting (Vercel, Netlify, Cloudflare Pages)

Next.js route handler

Custom server

Verify after deploy

7. The Validation Checklist

8. Prompt Engineering for Custom Generators

Be explicit about the spec

Forbid code fences explicitly

Constrain to allowlisted URLs

Set output size limits

Two-pass for quality

9. When to Regenerate

10. Frequently Asked Questions