What this page shows. A full audit of Nature’s Path’s XML sitemap architecture: the parent index at https://naturespath.com/sitemap.xml, its 24 child sitemaps across 6 locale markets (en-US, en-CA, fr-CA, en-GB, en-MX, es-MX), the 4 content types under each (products, collections, pages, blogs), and 8,773 total URLs.

Sitemap hygiene determines what Google can find and how quickly. A bloated sitemap dilutes crawl budget and trains Google that low-value URLs are publishable content; a clean sitemap focuses crawl attention on the pages that actually earn rankings. For a site the size of NP (8,773 URLs across 6 locales) the bloat cost compounds each crawl cycle until the hygiene work is done.

How to read the findings. Start with URL Inventory to see the locale distribution, then Validation Checks for the XML-level pass/fail list. The 28 Pages to Remove section is the bulk of the remediation work. The final two sections name Shopify platform limits that cannot be fixed in the theme — document them, escalate to Shopify support, and move on.

What the data says about NP today. All 24 child sitemaps parse cleanly and no file approaches the 50,000 URL limit. Major hygiene opportunities: 28 pages flagged for removal across 6 locales (170 total URL instances), no IndexNow enabled, no AI-crawler directives in robots.txt, and a Shopify bug that inserts every locale homepage into the products sitemap. Product <lastmod> values are all identical (Shopify-generated at fetch time), giving Google zero scheduling signal for products.

URL Inventory by Locale

What this section shows. The breakdown of the 8,773 sitemap URLs by locale and content type, with a totals row at the bottom.

URL distribution is the first signal of locale bloat. When a locale carries the same blog count as the default market but only a handful of translated products, most of those blog URLs are English content under a locale prefix (thin or duplicate at the URL level). The distribution table below makes that pattern visible at a glance.

How to read the table. Compare Products across locales to spot thin markets (en-GB at 7, en-MX / es-MX at 24 each). Blogs and Pages are replicated across all 6 locales regardless of translation, which inflates the total and is the primary target for locale-level suppression.

LocaleProductsCollectionsPagesBlogsTotal
en-us (default)13669651,2421,512
en-ca16369651,2421,539
fr-ca16369651,2421,539
en-gb769651,2421,383
en-mx2469651,2421,400
es-mx2469651,2421,400
TOTAL5174143907,4528,773

No child sitemap approaches the 50,000 URL limit. Largest single file is any locale’s blogs sitemap at 1,242 URLs.

Validation Checks

What this section shows. An 8-point XML-level validation pass against each sitemap file: parse validity, per-file URL limit, deprecated priority / changefreq tags, <lastmod> accuracy for products and blogs, robots.txt declaration, and IndexNow status.

XML validity is a baseline, not a finish line. A sitemap that parses cleanly can still provide zero scheduling signal to Google if its timestamps are generated at fetch time rather than at record modification. The product <lastmod> failure below is the clearest example: valid XML, useless content.

How to read the table. PASS rows are clean; INFO rows flag non-harmful quirks Google ignores; FAIL and MISSING rows are real remediation targets. The MISSING row on IndexNow is a 5-minute Shopify Admin toggle and the highest-leverage single fix on this page.

CheckStatusNotes
XML validityPASSAll 24 child sitemaps parse without errors
50,000 URL limit per filePASSMax 1,242 URLs
<priority> tagsPASSNot present (deprecated)
<changefreq> tagsINFOPresent in all child sitemaps; Google-ignored; Shopify-generated, cannot remove without custom sitemap solution
<lastmod> on productsFAILAll product lastmods are identical timestamps (generated at fetch time, not real modification dates). Shopify-dynamic. Zero scheduling signal to Google.
<lastmod> on blogsPASS1,145 unique lastmod values across 1,242 blog URLs — genuine modification dates
robots.txt sitemap declarationPASSDeclared three times (wildcard, AhrefsBot, AhrefsSiteAudit) — redundant but not harmful
IndexNowMISSINGNo IndexNow key file. Shopify natively supports IndexNow — enable via Admin > Online Store > Preferences > SEO > IndexNow (5 min toggle, covers Bing + Yandex)

28 Pages to Remove (170 URL instances across 6 locales)

What this section shows. The 28 URLs that should not be in the sitemap at all, grouped into three buckets (internal / B2B, compliance boilerplate, thin / ghost / expired) plus a Shopify bug that inserts every locale homepage into the products sitemap.

Sitemap bloat is a direct crawl-budget tax. Each of the 28 URLs listed here is replicated across 6 locales, so removing them strips 170 URL instances from Google’s crawl queue. That reclaimed crawl budget redirects to product and collection pages where ranking gains actually compound revenue.

How to read the tables. Each bucket names the URL pattern and the reason for removal. Effort is low: noindex meta tag per template plus sitemap suppression. The Shopify homepage-in-products bug is a platform-level issue; escalate to Shopify support and document, do not try to fix in theme code.

4a. Internal / B2B Pages (4 URLs × 6 = 24 instances)

URLReason
/pages/b2b-loginLogin page, no index value
/pages/wholesale-customers-homepageInternal wholesale hub
/collections/team-member-portalEmployee-only collection
/collections/wholesale-campInternal wholesale collection

4b. Compliance / Legal Boilerplate (7 URLs × 6 = 42 instances)

URL
/pages/gdpr-compliance
/pages/ccpa-cpra-compliance
/pages/appi-compliance
/pages/pipeda-compliance
/pages/vcdpa-compliance-1
/pages/ccpa-compliance
/pages/us-laws-compliance

4c. Thin / Ghost / Expired Pages (17 URLs × 6 = 102 instances)

URLReason
/pages/natures-path-email-signupEmail capture form only
/pages/recipes-search-resultsDynamic search results
/pages/love-child-newsletterNewsletter signup
/pages/protein-granola-giveawayExpired promotion
/pages/love-crunch-protein-granola-kit-sweepstakes-official-rulesExpired sweepstakes rules
/pages/video-inventory-anitas-organic-millInternal video inventory
/collections/frontpageShopify default ghost collection, always empty
/collections/united-states + /collections/usaLocale-segmentation ghost collections
/collections/bestseller-ca / /bestsellers-mx-1 / /bestsellers-ukInternal sorting collections
/collections/np-products-mx + /collections/np-products-ukMarket internal collections
/collections/baby-purees-surplus-sale + /collections/baby-toddler-snacks-saleClearance, likely empty
/collections/que-pasa-ymal“You May Also Like” algorithmic collection

Additional: Homepage in Products Sitemap (Shopify bug)

Every locale’s product sitemap includes the locale homepage
The root locale URL (/, /en-ca, /en-gb, etc.) is emitted into the products sitemap as the first entry. Shopify bug. The homepage is already indexed by Googlebot via crawl and does not need a sitemap entry. Not harmful, but technically incorrect. Cannot be fixed without a custom sitemap solution — flag to Shopify support.

Blog Sitemap (US Locale)

What this section shows. The 11 blog handles that make up the 1,242-URL US blog sitemap, with an article count per handle plus freshness and locale replication observations.

Blog sitemap composition exposes which content silos are active versus legacy. Recipes (561) and posts (545) do the heavy lifting; one-article handles (anitas-tips-and-techniques, anitas-guides) and the internal /blogs/wholesale handle (6 articles, should not be public) are the cleanup targets. Locale replication of English content across fr-ca / es-mx compounds the problem.

How to read the table. High-count handles are active content silos worth investing in. The highlighted /blogs/wholesale row is tagged for removal. The check-list below names freshness, locale replication, and the wholesale removal as the three remediation items.

Blog handleArticle count
/blogs/recipes561
/blogs/posts545
/blogs/press-releases42
/blogs/awards40
/blogs/anitas-recipes15
/blogs/anitas-community-stories8
/blogs/brother-nature-recipes6
/blogs/love-child-blog6
/blogs/wholesale6 (REMOVE — internal B2B)
/blogs/anitas-tips-and-techniques1
/blogs/anitas-guides1
  • Content freshness: 920/1,242 lastmod values (74%) in 2024, 176 (14%) in 2025, 67 (5%) in 2026 — accurately reflects publishing cadence
  • Locale replication of English blog content: all 1,242 blog URLs are replicated across all 6 locale sitemaps. For fr-ca and es-mx where content is served in English with locale prefix, this creates thin/duplicate content at URL level. Recommendation: suppress non-English locale blog URLs from sitemap if content is not translated
  • Remove /blogs/wholesale blog and its 6 articles (6 × 6 = 36 URL instances)

robots.txt + IndexNow Gaps

What this section shows. Two crawler-control gaps: missing AI-crawler directives in robots.txt (GPTBot, ClaudeBot, PerplexityBot, CCBot, Diffbot, Amazonbot), and IndexNow not yet enabled in Shopify Admin.

Crawler-directive clarity is the line between passive and intentional indexing. AI training crawlers are now routinely addressed by name so content licensing stance is explicit; IndexNow pushes product and blog updates to Bing and Yandex within minutes instead of waiting for the next crawl cycle. Both are low-effort additions with durable reputational and freshness payoffs.

How to read the callouts. The warning callout names the AI crawler directives and the recommended block set. The info callout names the IndexNow enable step (5-minute Shopify Admin toggle) and clarifies the Google impact: IndexNow does not reach Google, so Google freshness depends on GSC crawl rate plus sitemap <lastmod> (limited by the product-timestamp issue).

robots.txt gap — no AI crawler directives
GPTBot, ClaudeBot, PerplexityBot, CCBot, and Diffbot are not explicitly addressed. They fall under User-agent: * but major AI training crawlers are now routinely named explicitly for content licensing clarity. Given NP is an organic brand with content investment, blocking AI training crawlers is defensible and increasingly expected in the CPG category. Recommended addition:
User-agent: GPTBot · Disallow: /
User-agent: ClaudeBot · Disallow: /
User-agent: PerplexityBot · Disallow: /
User-agent: CCBot · Disallow: /
User-agent: Diffbot · Disallow: /
User-agent: Amazonbot · Disallow: /
Important: blocking AI training crawlers prevents training data inclusion but does NOT prevent GEO/AI citation from Googlebot-indexed content. Google’s SGE, Perplexity, and ChatGPT-with-search primarily use live search results — not direct crawl — for citations.
IndexNow — enable in Shopify admin (5 minutes)
Shopify natively supports IndexNow via its Bing/IndexNow partnership. Enable via Online Store > Preferences > Search engine optimization > IndexNow. Every product publish, collection update, and page change is pinged to Bing and Yandex within minutes. Impact on Google: zero (Google has not adopted IndexNow). For Google, freshness is governed by crawl rate set in GSC + sitemap <lastmod>. Given NP’s product lastmod issue (all identical), the <lastmod> signal provides no scheduling value to Google for products. GSC crawl rate is the only lever for Google product freshness.

Shopify Sitemap Quirks (Reference)

What this section shows. Three Shopify platform behaviors that affect sitemap output and cannot be fixed in theme code: query-string requirements on child sitemap URLs, auto-inserted <changefreq> tags, and dynamic product <lastmod> timestamps.

Platform-level limits need documented expectations. Spending engineering time fighting Shopify defaults is expensive; the right move is to capture each limit, state the workaround, and file a Shopify support ticket if any of the three ever becomes blocking. Future audit cycles then reference this section instead of re-discovering the same issues.

How to read the callouts. The first callout documents the query-string fetch requirement for anyone re-running the sitemap pull. The second names the three items that cannot be fixed without a custom sitemap solution, and is the reference document to cite in conversations with the NP IT team.

Query-string requirement on child sitemaps
Shopify sitemap URLs require ?from=X&to=Y query parameters or they return HTTP 400 from origin. The blog sitemap is the exception (bare path works). Always fetch the sitemap index first and extract the full URLs (with query strings) — don’t assume the bare path will work.
Cannot-fix items without custom sitemap
  • <changefreq> tags auto-inserted by Shopify — Google ignores them, but cannot be removed
  • Product <lastmod> is auto-generated at fetch time, not actual product modification date
  • Homepage leaks into products sitemap as first entry per locale

Implementation Checklist

Progress
0 / 13