What is an XML Sitemap?
An XML sitemap is a structured file that lists all important URLs on your website in a format that search engines can easily read. Think of it as a roadmap for crawlers like Googlebot, Bingbot, and other search engine bots — it tells them exactly which pages exist on your site and provides metadata about each page.
The file follows the sitemaps.org protocol, an industry standard supported by Google, Bing, Yahoo, and other major search engines. A sitemap is typically located at example.com/sitemap.xml and can also be referenced in your robots.txt file.
Basic XML Sitemap Example
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2025-01-15</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2024-12-20</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>Why Sitemaps Matter for SEO
While search engines can discover pages through internal links, a sitemap provides a direct, explicit list of all pages you want indexed. Here is why sitemaps are essential for your SEO strategy:
Faster Page Discovery
New pages and recently updated content are discovered faster when listed in a sitemap. Without one, search engines rely on following links from other pages, which can take days or weeks — especially for deeply nested content.
Crawl Budget Optimization
Search engines allocate a limited crawl budget to each site. By listing only your important, canonical pages in the sitemap, you guide crawlers to spend their budget on content that matters most for your rankings.
Metadata Communication
Sitemaps convey metadata like lastmod dates, helping search engines prioritize crawling of recently changed pages. Accurate lastmod values can significantly improve crawl efficiency.
Large Site Management
For sites with thousands or millions of pages, sitemaps are indispensable. Sitemap index files let you organize URLs into manageable chunks, ensuring comprehensive coverage that internal linking alone cannot achieve.
Types of Sitemaps
The sitemaps protocol supports several specialized formats beyond the standard URL sitemap. Each type serves a specific purpose and helps search engines understand different content types:
Standard XML Sitemap
The most common type, listing page URLs with optional metadata like lastmod, changefreq, and priority. Every website should have at least one standard sitemap.
Sitemap Index
A master file that references multiple child sitemaps. Required when your site exceeds the 50,000 URL limit per sitemap. The index file itself uses the <sitemapindex> tag instead of <urlset>.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-pages.xml</loc>
<lastmod>2025-01-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2025-01-14</lastmod>
</sitemap>
</sitemapindex>Image Sitemap
Uses the <image:image> extension to help Google discover images that might not be found through regular crawling. This is especially important for sites that load images dynamically via JavaScript.
Video Sitemap
Uses the <video:video> extension to provide metadata about video content, including title, description, duration, and thumbnail URL. Enables video rich results in Google Search.
News Sitemap
Designed for Google News publishers, listing articles published within the last 48 hours. Uses the <news:news> extension with publication name, language, title, and publication date.
The 13 Parameters We Check
Our sitemap checker evaluates your sitemap across 13 validation parameters plus 3 informational parameters. Here is what we look for:
Critical Parameters
- XML Structure — Valid XML format required for search engines to parse your sitemap
- Relative URLs — All URLs must be absolute (e.g., https://example.com/page, not /page)
- Non-HTTPS URLs — All URLs should use HTTPS for security and ranking signals
Moderate Parameters
- Sitemap URLs — Sitemap must not be empty; include your important pages
- Size Limits — Must not exceed the 50,000 URL or 50MB file size limit
- Robots.txt Reference — Sitemap URL should be declared in your robots.txt
- Cross-Domain URLs — All URLs must be from the same domain
- Orphaned URLs — Sitemap pages should have internal links pointing to them
- Sitemap Consistency — Website sitemap should match what is submitted to Google Search Console
- Index Coverage — Percentage of sitemap URLs that are actually indexed
Minor Parameters
- Lastmod Dates — All dates must be valid ISO 8601 format, not in the future
- Duplicate URLs — No duplicate entries to avoid wasted crawl budget
- Inaccessible URLs — All listed URLs should return HTTP 200 status
Common Sitemap Mistakes to Avoid
Even experienced developers make sitemap mistakes that can hurt SEO performance. Here are the most common issues we see and how to fix them:
Including Non-Canonical URLs
One of the most common mistakes is including URLs that are not the canonical version of a page. If you have both https://example.com/page and https://example.com/page/ in your sitemap, search engines will be confused about which to index. Only include the canonical URL for each page and match your canonical tags.
Setting All lastmod Dates to the Same Value
Many CMS platforms set all lastmod dates to the current date on every sitemap regeneration. Google has explicitly stated that they may ignore lastmod entirely if the values appear inaccurate. Set lastmod to the actual date each page was last meaningfully modified.
Including Blocked or Noindex Pages
Including pages that are blocked by robots.txt or have a noindex meta tag creates conflicting signals. Search engines see the sitemap saying "index this page" while the page itself says "do not index." Only include pages that you want indexed and remove blocked and noindex pages from your sitemap.
Not Updating After Site Changes
A stale sitemap that does not reflect your current site structure is almost as bad as having no sitemap. If pages have been removed, redirected, or their URLs changed, the sitemap must be updated accordingly. Use dynamic sitemap generation (most CMS platforms support this) or update manually after changes.
Sitemap Best Practices
Follow these best practices to ensure your sitemap is working optimally for search engines:
Use Absolute URLs Only
Every URL in your sitemap must include the full protocol and domain:https://example.com/pagenot /page.
Reference in robots.txt
Add a Sitemap: directive to your robots.txt file. This ensures all search engines can discover your sitemap automatically without requiring manual submission.
Keep It Under 50,000 URLs
If your site has more than 50,000 URLs, split them into multiple sitemaps and create a sitemap index file. Each individual sitemap should also stay under 50MB uncompressed.
Use Accurate lastmod Dates
Set the lastmod value to when the page content was last meaningfully changed. Google uses this to prioritize which pages to crawl. Inaccurate dates will cause Google to ignore the field entirely.
Include Only Indexable Pages
Your sitemap should only contain canonical, indexable URLs that return HTTP 200. Exclude redirects (301/302), error pages (404/5xx), pages with noindex tags, and pages blocked by robots.txt.
Submit to Google Search Console
While robots.txt discovery works for most search engines, submitting your sitemap directly through Google Search Console gives you insights into crawl errors, index coverage, and processing status.
Platform-Specific Sitemap Setup
Different platforms and frameworks handle sitemaps differently. Here is how to set up sitemaps on popular platforms:
WordPress
WordPress 5.5+ includes a built-in sitemap at /wp-sitemap.xml. For more control, use plugins like Yoast SEO, Rank Math, or All in One SEO which offer advanced sitemap customization including image sitemaps, post type filtering, and priority settings.
Shopify
Shopify automatically generates a sitemap at /sitemap.xml that includes products, collections, blog posts, and pages. It updates automatically when you add or remove content. Shopify also auto-includes the sitemap reference in robots.txt.
Next.js
Next.js (App Router) supports sitemaps via a sitemap.ts or sitemap.xml file in the app/ directory. The TypeScript approach allows dynamic generation from your data sources, ensuring the sitemap always reflects your current routes.
Static Sites (Gatsby, Hugo, Jekyll)
Static site generators typically generate sitemaps at build time. Gatsby uses gatsby-plugin-sitemap, Hugo has built-in sitemap templates, and Jekyll supports sitemaps via the jekyll-sitemap gem. Ensure your CI/CD pipeline regenerates the sitemap on every deployment.
Frequently Asked Questions
Does Google require a sitemap?
No, Google does not require a sitemap. However, Google strongly recommends having one, especially for large sites, new sites with few external links, sites with rich media content, and sites with pages not well linked through internal navigation. A sitemap ensures that Google knows about all your important pages.
Can a sitemap help with indexing issues?
Yes. If pages are not appearing in Google Search, submitting them through a sitemap is one of the first troubleshooting steps. Check Google Search Console's Index Coverage report to see if submitted URLs are being indexed. Note that a sitemap is a signal, not a guarantee — Google ultimately decides what to index based on content quality and relevance.
What is the difference between changefreq, priority, and lastmod?
lastmod indicates when the page was last modified — this is the most important field as Google actively uses it for crawl scheduling. changefreq suggests how often the page changes (daily, weekly, etc.) but Google has stated it largely ignores this field. priority is a hint about relative importance (0.0 to 1.0) within your site, but Google also largely ignores it. Focus your efforts on accurate lastmod values.
Should I compress my sitemap with gzip?
Yes, gzip compression is recommended, especially for larger sitemaps. A compressed sitemap uses the .xml.gz extension. All major search engines support gzip-compressed sitemaps. The 50MB size limit applies to the uncompressed content, so compression helps with transfer speed without affecting the URL limit.
How many sitemaps can I have?
There is no strict limit on the number of sitemaps. A sitemap index file can reference up to 50,000 individual sitemaps, and each sitemap can contain up to 50,000 URLs. This means you can theoretically list up to 2.5 billion URLs. In practice, organize sitemaps logically — by content type, section, or language.
Do I need separate sitemaps for images and videos?
Not necessarily. Image and video information can be included as extensions within your standard sitemap using <image:image> and <video:video> tags. However, creating separate sitemaps for each content type can make management easier for large sites with extensive media libraries.