Skip to main content
Innovatrix Infotech — home
Shopify Sitemap and Robots.txt: The Technical SEO Setup Guide for 2026 cover
Shopify

Shopify Sitemap and Robots.txt: The Technical SEO Setup Guide for 2026

Shopify auto-generates your sitemap.xml and robots.txt — but most store owners don't know what's in them, whether they're correct, or how to customise them. Here's the complete technical SEO setup guide.

Photo of Rishabh SethiaRishabh SethiaFounder & CEO3 November 2025Updated 27 March 202612 min read1.4k words
#shopify#technical-seo#sitemap#robots-txt#seo#google-search-console#shopify-development

Someone told you to "fix your sitemap." Or an SEO audit flagged your robots.txt configuration. Or you've been running your Shopify store for two years and never once looked at either of these files.

All three scenarios are more common than you'd think.

Shopify handles sitemaps and robots.txt automatically, which is both a blessing and a curse. A blessing because you don't need to generate them manually. A curse because most store owners assume "automatic" means "correct" — and it doesn't always.

This guide covers what Shopify actually generates, what you should verify, and how to customise robots.txt for better crawl control.

Understanding Shopify's Auto-Generated Sitemap

Every Shopify store has a sitemap at yourstore.com/sitemap.xml. This is a sitemap index — it doesn't contain your pages directly. Instead, it points to child sitemaps:

<sitemap>
  <loc>https://yourstore.com/sitemap_products_1.xml</loc>
</sitemap>
<sitemap>
  <loc>https://yourstore.com/sitemap_pages_1.xml</loc>
</sitemap>
<sitemap>
  <loc>https://yourstore.com/sitemap_collections_1.xml</loc>
</sitemap>
<sitemap>
  <loc>https://yourstore.com/sitemap_blogs_1.xml</loc>
</sitemap>

Each child sitemap contains up to 5,000 URLs. If you have more than 5,000 products, Shopify creates sitemap_products_2.xml, and so on.

What gets included automatically:

  • All published products
  • All published pages
  • All published collections
  • All published blog posts
  • Your homepage

What gets excluded automatically:

  • Draft products (status: draft)
  • Password-protected pages
  • Pages with the noindex tag
  • Checkout pages
  • Cart pages
  • Account pages (login, register, order history)

This is generally correct behaviour. You don't want draft products or checkout pages in your sitemap.

Verifying Your Sitemap Is Correct

Open yourstore.com/sitemap.xml in a browser. Then spot-check each child sitemap:

  1. Products: Open sitemap_products_1.xml. Are all your published products listed? Are any draft or archived products accidentally appearing? Count the URLs and compare against your product count in Shopify admin.

  2. Collections: Open sitemap_collections_1.xml. Shopify includes all published collections. Check for collections you might not want indexed — internal collections used for automations, test collections, or duplicate collections created by apps.

  3. Pages: Open sitemap_pages_1.xml. Verify your important pages (About, Contact, FAQ, Policy pages) are all present.

  4. Blogs: Open sitemap_blogs_1.xml. Every published blog post should appear here. If you've been publishing blog content and it's not in the sitemap, check if the blog section is set to published in Shopify admin.

Common issue: some Shopify apps create hidden pages or collections that end up in your sitemap. If you see URLs you don't recognise, investigate whether they're from an installed app.

Submitting Your Sitemap to Google Search Console

If you haven't done this, do it now. It takes 30 seconds.

  1. Go to Google Search Console
  2. Select your property (your Shopify store domain)
  3. Navigate to Sitemaps in the left sidebar
  4. Enter sitemap.xml in the "Add a new sitemap" field
  5. Click Submit

Google will crawl your sitemap and report back on how many URLs were discovered, how many were indexed, and any errors.

For Bing: the process is identical in Bing Webmaster Tools. Submit the same sitemap URL.

Check back in a week. If there's a large gap between "discovered" and "indexed" URLs, that signals potential quality issues with some of your pages — thin content, duplicate content, or crawl errors.

Shopify's robots.txt: What Most Developers Don't Know

Since Shopify 2.0 (Online Store 2.0 themes), the robots.txt file is customisable through a Liquid template. This was a significant change that most developers and store owners still don't know about.

Your default Shopify robots.txt is at yourstore.com/robots.txt. It looks something like this:

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkouts/
Disallow: /checkout
Disallow: /carts
Disallow: /account
Disallow: /*?*variant=
Disallow: /*?*q=
Disallow: /*?*sort_by=
Disallow: /*?*filter.*=
Allow: /collections/*+*
Sitemap: https://yourstore.com/sitemap.xml

These defaults are sensible. Admin pages, cart, checkout, and account pages should be blocked from crawlers. The query parameter disallows prevent Google from crawling infinite variations of filtered and sorted collection pages.

Customising robots.txt with robots.txt.liquid

Here's the part most guides skip. In your Shopify theme files, you can create or edit the robots.txt.liquid template.

In your theme editor: go to Online Store → Themes → Edit code → Templates → look for robots.txt.liquid.

If it doesn't exist, create it. Shopify provides a robots Liquid object that outputs the default rules. Start with the defaults and add your custom rules:

{% comment %}
  Output Shopify's default robots.txt rules
{% endcomment %}
{{ robots.default_content }}

{% comment %}
  Custom rules below
{% endcomment %}

# Block internal search results pages
User-agent: *
Disallow: /search
Disallow: /search?

# Block tagged collection pages (common source of duplicate content)
User-agent: *
Disallow: /collections/*+*

# Block specific pages you don't want indexed
User-agent: *
Disallow: /pages/test-page
Disallow: /pages/old-landing-page

When to Customise robots.txt

Customise when:

  • Internal search pages are being indexed: Shopify's /search results pages create thin content pages for every search query. Block them.
  • Tagged collection URLs create duplicate content: URLs like /collections/t-shirts/blue+cotton are auto-generated by Shopify's tag filtering. These create near-duplicate versions of your collection pages.
  • App-generated pages you don't want indexed: Some Shopify apps create public-facing pages (wishlists, comparison tools) that shouldn't be in Google's index.
  • Temporary landing pages: If you create campaign-specific pages that should live on your store but not rank in search, disallow them.

Do not customise to:

  • Block your entire /collections/ directory (you want collections indexed)
  • Block legitimate product pages
  • Try to remove pages from Google's index (robots.txt prevents crawling, not indexing — use a noindex meta tag for that)

Shopify's Canonical Tag Implementation

Shopify automatically adds canonical tags to prevent duplicate content issues. For most pages, this works correctly:

<link rel="canonical" href="https://yourstore.com/products/blue-tshirt">

But there are edge cases where Shopify's canonical implementation breaks:

Variant URLs: When a customer selects a variant, the URL changes to /products/blue-tshirt?variant=12345678. Shopify correctly canonicalises this back to the base product URL. But some themes or apps override this behaviour.

Collection-prefixed product URLs: Shopify creates URLs like /collections/summer/products/blue-tshirt when a product is accessed from a collection page. The canonical tag should point to /products/blue-tshirt. Verify this is working — some themes break it.

Paginated collection pages: /collections/all?page=2 should canonical back to /collections/all. Check this on collections with pagination.

To check canonical tags: view page source on any page and search for rel="canonical". The URL should always be the cleanest version of that page without query parameters or collection prefixes.

The 5 Most Common Shopify SEO Configuration Errors

These come up in almost every technical SEO audit we run on Shopify stores:

1. Sitemap never submitted to Search Console. The store has been live for a year and nobody has submitted the sitemap. Google will eventually find your pages, but submission speeds up discovery and gives you visibility into indexing issues.

2. Duplicate title tags and meta descriptions. Shopify auto-generates SEO titles from product names, but if you haven't customised them, you might have 50 product pages all with generic patterns. Each page needs a unique, keyword-optimised title tag.

3. Internal search results being indexed. Check site:yourstore.com inurl:search in Google. If you see results, your search pages are being indexed and diluting your crawl budget.

4. Tag-based collection pages creating duplicate content. Check site:yourstore.com inurl:/collections/ inurl:+ in Google. These filtered collection pages often contain the same products as the parent collection with slightly different URLs.

5. Missing or broken canonical tags on variant URLs. Open a product page, select different variants, and check if the canonical tag stays consistent. Some apps and theme customisations break this.

The Audit Workflow

Here's a practical 30-minute technical SEO check you can run on your Shopify store right now:

  1. Open yourstore.com/sitemap.xml — verify all child sitemaps load correctly
  2. Submit the sitemap to Google Search Console if you haven't already
  3. Open yourstore.com/robots.txt — verify the default rules are present
  4. Run site:yourstore.com inurl:search in Google — if results appear, add a search disallow to robots.txt
  5. Check canonical tags on three pages: a product page, a collection page, and a product page accessed from a collection
  6. Verify your homepage has a unique title tag and meta description
  7. Spot-check 5 product pages for unique title tags

This takes 30 minutes and catches the most impactful issues.

For the performance side of Shopify SEO, our image optimization guide covers the other half of the equation — because technical SEO configuration means nothing if your pages take 4 seconds to load.

And for the accessibility side, our WCAG compliance guide covers the overlap between accessibility and SEO — alt text, heading structure, and semantic HTML all affect both.

If you want a comprehensive technical SEO audit of your Shopify store, book a discovery call with us. At Innovatrix Infotech, technical SEO is part of every build we deliver — not an afterthought bolted on post-launch.

Written by

Photo of Rishabh Sethia
Rishabh Sethia

Founder & CEO

Rishabh Sethia is the founder and CEO of Innovatrix Infotech, a Kolkata-based digital engineering agency. He leads a team that delivers web development, mobile apps, Shopify stores, and AI automation for startups and SMBs across India and beyond.

Connect on LinkedIn
Get started

Ready to talk about your project?

Whether you have a clear brief or an idea on a napkin, we'd love to hear from you. Most projects start with a 30-minute call — no pressure, no sales pitch.

No upfront commitmentResponse within 24 hoursFixed-price quotes