Back to Blog

By Dylan Hunt

March 17th, 2026

seo

A Shopify Technical SEO Checklist: Canonicals, Sitemaps, and Crawl Budget

A Shopify Technical SEO Checklist: Canonicals, Sitemaps, and Crawl Budget

Shopify handles a lot of technical SEO for you, which is a blessing and a trap. The defaults are good enough that many stores never look closer, and then they hit the handful of issues Shopify's structure creates and have no idea where they came from. This is the checklist we run, focused on the things that are actually Shopify-specific rather than generic advice you have read ten times.

Canonical tags and the duplicate-URL problem

Shopify can serve the same product at more than one URL. A product reached through a collection often gets a URL like /collections/jackets/products/rain-shell, while the same product also lives at /products/rain-shell. Without help, that looks like duplicate content.

Shopify's default themes handle this correctly: the canonical tag on both URLs points at the clean /products/rain-shell version. The trap is custom themes and heavy customization, where someone occasionally breaks the canonical logic. So the first check is simple. View the source of a product reached through a collection and confirm the canonical points at the bare /products/ URL, not the collection-scoped one.

The second canonical issue is variants. A URL like /products/rain-shell?variant=12345 should canonicalize to the base product, not stand alone. Confirm that too, because a theme that treats each variant as its own indexable page splits your ranking signal across near-identical pages.

Faceted navigation and parameter sprawl

Collection filters generate URLs. Filter by size, color, and price, and you can produce thousands of parameterized URLs that are all thin variations of one collection. Left unmanaged, crawlers waste their time on them and your genuinely important pages get crawled less often.

Shopify's newer filtering uses parameters that search engines mostly understand, but you should still decide deliberately which filtered views you want indexed, if any. In most cases the answer is none. The base collection page is the one that should rank, and the filtered combinations should be reachable for shoppers without inviting crawlers to index every permutation. Keep an eye on this in Search Console under the pages report, where a flood of indexed parameter URLs is the tell.

The sitemap you get and the sitemap you might want

Shopify generates sitemap.xml automatically and keeps it current, split into child sitemaps for products, collections, pages, and blogs. For most stores this is fine and you should not fight it.

Two things are worth knowing. First, the auto sitemap includes everything publishable, including products and pages you might prefer to keep out of the index. You control inclusion through the product's or page's availability and visibility, not through editing the sitemap directly. Second, on Shopify plans below Plus you cannot edit the sitemap, so if you need custom sitemap behavior, that is a platform constraint to plan around rather than a bug to fix.

Confirm your sitemap is referenced from robots.txt, submit it in Search Console, and then mostly leave it alone. The sitemap is rarely where Shopify SEO problems actually live.

robots.txt and the AI crawlers

Shopify lets you customize robots.txt through robots.txt.liquid. Most stores never need to, and editing it carelessly is a good way to deindex yourself, so be conservative.

The one edit worth thinking about in 2026 is the AI crawlers. The retrieval bots behind ChatGPT, Perplexity, and Claude, named GPTBot, OAI-SearchBot, PerplexityBot, and ClaudeBot among others, are how a growing share of shoppers now find products. If your goal is to appear in those answers, do not block them. Some stores blanket-blocked AI bots in a reflex against scraping and quietly opted out of an entire discovery channel. You can make a considered choice about training versus retrieval, but make it on purpose, not by accident.

Crawl budget: real for large catalogs, a non-issue for small ones

Crawl budget is the attention a search engine is willing to spend on your site. For a 200-product store it is effectively unlimited and not worth a second thought. For a 50,000-product catalog it is a genuine constraint, and the parameter sprawl above is the fastest way to waste it.

The levers that matter for large stores: keep the parameter explosion in check, make sure your internal linking surfaces important products within a few clicks of the homepage, fix broken links and long redirect chains that burn crawls, and keep the site fast so each crawl costs less. None of this is exotic. It is mostly about not making the crawler work harder than it should.

A short order of operations

When we audit a Shopify store's technical SEO, this is the sequence:

  1. Confirm canonicals on products reached via collections and on variant URLs.
  2. Check Search Console's pages report for indexed parameter URLs and thin filtered views.
  3. Confirm the sitemap is current, referenced in robots.txt, and submitted.
  4. Review robots.txt, especially the AI crawler stance, and make it deliberate.
  5. For large catalogs, audit internal linking, redirects, and crawl waste.
  6. Fix Core Web Vitals, which feeds both ranking and crawl efficiency.

Most stores are healthy on three or four of these and quietly bleeding on one. The value is in finding which one. Shopify gives you a strong default; technical SEO on Shopify is mostly the work of confirming the default held and cleaning up the few places its structure works against you.

This is the groundwork we lay before anything else on a store, because structured data and AI readiness sit on top of it. A page an agent cannot crawl is a page it cannot recommend, no matter how good the schema on it is.

ShareXLinkedInFacebook

Written by Dylan Hunt, Founder, Caffeine and Commerce. We build Shopify stores that rank and that AI agents can read. Have a project? Get in touch.