Nature's Path — XML Sitemap

What this page shows. A full audit of Nature’s Path’s XML sitemap architecture: the parent index at https://naturespath.com/sitemap.xml, its 24 child sitemaps across 6 locale markets (en-US, en-CA, fr-CA, en-GB, en-MX, es-MX), the 4 content types under each (products, collections, pages, blogs), and 8,773 total URLs.

Sitemap hygiene determines what Google can find and how quickly. A bloated sitemap dilutes crawl budget and trains Google that low-value URLs are publishable content; a clean sitemap focuses crawl attention on the pages that actually earn rankings. For a site the size of NP (8,773 URLs across 6 locales) the bloat cost compounds each crawl cycle until the hygiene work is done.

How to read the findings. Start with URL Inventory to see the locale distribution, then Validation Checks for the XML-level pass/fail list. The 28 Pages to Remove section is the bulk of the remediation work. The final two sections name Shopify platform limits that cannot be fixed in the theme — document them, escalate to Shopify support, and move on.

What the data says about NP today. All 24 child sitemaps parse cleanly and no file approaches the 50,000 URL limit. Major hygiene opportunities: 28 pages flagged for removal across 6 locales (170 total URL instances), no IndexNow enabled, no AI-crawler directives in robots.txt, and a Shopify bug that inserts every locale homepage into the products sitemap. Product <lastmod> values are all identical (Shopify-generated at fetch time), giving Google zero scheduling signal for products.

URL Inventory by Locale

What this section shows. The breakdown of the 8,773 sitemap URLs by locale and content type, with a totals row at the bottom.

URL distribution is the first signal of locale bloat. When a locale carries the same blog count as the default market but only a handful of translated products, most of those blog URLs are English content under a locale prefix (thin or duplicate at the URL level). The distribution table below makes that pattern visible at a glance.

How to read the table. Compare Products across locales to spot thin markets (en-GB at 7, en-MX / es-MX at 24 each). Blogs and Pages are replicated across all 6 locales regardless of translation, which inflates the total and is the primary target for locale-level suppression.

Locale	Products	Collections	Pages	Blogs	Total
en-us (default)	136	69	65	1,242	1,512
en-ca	163	69	65	1,242	1,539
fr-ca	163	69	65	1,242	1,539
en-gb	7	69	65	1,242	1,383
en-mx	24	69	65	1,242	1,400
es-mx	24	69	65	1,242	1,400
TOTAL	517	414	390	7,452	8,773

No child sitemap approaches the 50,000 URL limit. Largest single file is any locale’s blogs sitemap at 1,242 URLs.

Validation Checks

What this section shows. An 8-point XML-level validation pass against each sitemap file: parse validity, per-file URL limit, deprecated priority / changefreq tags, <lastmod> accuracy for products and blogs, robots.txt declaration, and IndexNow status.

XML validity is a baseline, not a finish line. A sitemap that parses cleanly can still provide zero scheduling signal to Google if its timestamps are generated at fetch time rather than at record modification. The product <lastmod> failure below is the clearest example: valid XML, useless content.

How to read the table. PASS rows are clean; INFO rows flag non-harmful quirks Google ignores; FAIL and MISSING rows are real remediation targets. The MISSING row on IndexNow is a 5-minute Shopify Admin toggle and the highest-leverage single fix on this page.

Check	Status	Notes
XML validity	PASS	All 24 child sitemaps parse without errors
50,000 URL limit per file	PASS	Max 1,242 URLs
`<priority>` tags	PASS	Not present (deprecated)
`<changefreq>` tags	INFO	Present in all child sitemaps; Google-ignored; Shopify-generated, cannot remove without custom sitemap solution
`<lastmod>` on products	FAIL	All product lastmods are identical timestamps (generated at fetch time, not real modification dates). Shopify-dynamic. Zero scheduling signal to Google.
`<lastmod>` on blogs	PASS	1,145 unique lastmod values across 1,242 blog URLs — genuine modification dates
robots.txt sitemap declaration	PASS	Declared three times (wildcard, AhrefsBot, AhrefsSiteAudit) — redundant but not harmful
IndexNow	MISSING	No IndexNow key file. Shopify natively supports IndexNow — enable via Admin > Online Store > Preferences > SEO > IndexNow (5 min toggle, covers Bing + Yandex)

28 Pages to Remove (170 URL instances across 6 locales)

What this section shows. The 28 URLs that should not be in the sitemap at all, grouped into three buckets (internal / B2B, compliance boilerplate, thin / ghost / expired) plus a Shopify bug that inserts every locale homepage into the products sitemap.

Sitemap bloat is a direct crawl-budget tax. Each of the 28 URLs listed here is replicated across 6 locales, so removing them strips 170 URL instances from Google’s crawl queue. That reclaimed crawl budget redirects to product and collection pages where ranking gains actually compound revenue.

How to read the tables. Each bucket names the URL pattern and the reason for removal. Effort is low: noindex meta tag per template plus sitemap suppression. The Shopify homepage-in-products bug is a platform-level issue; escalate to Shopify support and document, do not try to fix in theme code.

4a. Internal / B2B Pages (4 URLs × 6 = 24 instances)

URL	Reason
`/pages/b2b-login`	Login page, no index value
`/pages/wholesale-customers-homepage`	Internal wholesale hub
`/collections/team-member-portal`	Employee-only collection
`/collections/wholesale-camp`	Internal wholesale collection

4b. Compliance / Legal Boilerplate (7 URLs × 6 = 42 instances)

URL
`/pages/gdpr-compliance`
`/pages/ccpa-cpra-compliance`
`/pages/appi-compliance`
`/pages/pipeda-compliance`
`/pages/vcdpa-compliance-1`
`/pages/ccpa-compliance`
`/pages/us-laws-compliance`

4c. Thin / Ghost / Expired Pages (17 URLs × 6 = 102 instances)

URL	Reason
`/pages/natures-path-email-signup`	Email capture form only
`/pages/recipes-search-results`	Dynamic search results
`/pages/love-child-newsletter`	Newsletter signup
`/pages/protein-granola-giveaway`	Expired promotion
`/pages/love-crunch-protein-granola-kit-sweepstakes-official-rules`	Expired sweepstakes rules
`/pages/video-inventory-anitas-organic-mill`	Internal video inventory
`/collections/frontpage`	Shopify default ghost collection, always empty
`/collections/united-states` + `/collections/usa`	Locale-segmentation ghost collections
`/collections/bestseller-ca` / `/bestsellers-mx-1` / `/bestsellers-uk`	Internal sorting collections
`/collections/np-products-mx` + `/collections/np-products-uk`	Market internal collections
`/collections/baby-purees-surplus-sale` + `/collections/baby-toddler-snacks-sale`	Clearance, likely empty
`/collections/que-pasa-ymal`	“You May Also Like” algorithmic collection

Additional: Homepage in Products Sitemap (Shopify bug)

Every locale’s product sitemap includes the locale homepage

The root locale URL (/, /en-ca, /en-gb, etc.) is emitted into the products sitemap as the first entry. Shopify bug. The homepage is already indexed by Googlebot via crawl and does not need a sitemap entry. Not harmful, but technically incorrect. Cannot be fixed without a custom sitemap solution — flag to Shopify support.

Blog Sitemap (US Locale)

What this section shows. The 11 blog handles that make up the 1,242-URL US blog sitemap, with an article count per handle plus freshness and locale replication observations.

Blog sitemap composition exposes which content silos are active versus legacy. Recipes (561) and posts (545) do the heavy lifting; one-article handles (anitas-tips-and-techniques, anitas-guides) and the internal /blogs/wholesale handle (6 articles, should not be public) are the cleanup targets. Locale replication of English content across fr-ca / es-mx compounds the problem.

How to read the table. High-count handles are active content silos worth investing in. The highlighted /blogs/wholesale row is tagged for removal. The check-list below names freshness, locale replication, and the wholesale removal as the three remediation items.

Blog handle	Article count
`/blogs/recipes`	561
`/blogs/posts`	545
`/blogs/press-releases`	42
`/blogs/awards`	40
`/blogs/anitas-recipes`	15
`/blogs/anitas-community-stories`	8
`/blogs/brother-nature-recipes`	6
`/blogs/love-child-blog`	6
`/blogs/wholesale`	6 (REMOVE — internal B2B)
`/blogs/anitas-tips-and-techniques`	1
`/blogs/anitas-guides`	1

Content freshness: 920/1,242 lastmod values (74%) in 2024, 176 (14%) in 2025, 67 (5%) in 2026 — accurately reflects publishing cadence
Locale replication of English blog content: all 1,242 blog URLs are replicated across all 6 locale sitemaps. For fr-ca and es-mx where content is served in English with locale prefix, this creates thin/duplicate content at URL level. Recommendation: suppress non-English locale blog URLs from sitemap if content is not translated
Remove /blogs/wholesale blog and its 6 articles (6 × 6 = 36 URL instances)

robots.txt + IndexNow Gaps

What this section shows. Two crawler-control gaps: missing AI-crawler directives in robots.txt (GPTBot, ClaudeBot, PerplexityBot, CCBot, Diffbot, Amazonbot), and IndexNow not yet enabled in Shopify Admin.

Crawler-directive clarity is the line between passive and intentional indexing. AI training crawlers are now routinely addressed by name so content licensing stance is explicit; IndexNow pushes product and blog updates to Bing and Yandex within minutes instead of waiting for the next crawl cycle. Both are low-effort additions with durable reputational and freshness payoffs.

How to read the callouts. The warning callout names the AI crawler directives and the recommended block set. The info callout names the IndexNow enable step (5-minute Shopify Admin toggle) and clarifies the Google impact: IndexNow does not reach Google, so Google freshness depends on GSC crawl rate plus sitemap <lastmod> (limited by the product-timestamp issue).

robots.txt gap — no AI crawler directives

GPTBot, ClaudeBot, PerplexityBot, CCBot, and Diffbot are not explicitly addressed. They fall under User-agent: * but major AI training crawlers are now routinely named explicitly for content licensing clarity. Given NP is an organic brand with content investment, blocking AI training crawlers is defensible and increasingly expected in the CPG category. Recommended addition:

User-agent: GPTBot · Disallow: /
User-agent: ClaudeBot · Disallow: /
User-agent: PerplexityBot · Disallow: /
User-agent: CCBot · Disallow: /
User-agent: Diffbot · Disallow: /
User-agent: Amazonbot · Disallow: /

Important: blocking AI training crawlers prevents training data inclusion but does NOT prevent GEO/AI citation from Googlebot-indexed content. Google’s SGE, Perplexity, and ChatGPT-with-search primarily use live search results — not direct crawl — for citations.

IndexNow — enable in Shopify admin (5 minutes)

Shopify natively supports IndexNow via its Bing/IndexNow partnership. Enable via Online Store > Preferences > Search engine optimization > IndexNow. Every product publish, collection update, and page change is pinged to Bing and Yandex within minutes. Impact on Google: zero (Google has not adopted IndexNow). For Google, freshness is governed by crawl rate set in GSC + sitemap <lastmod>. Given NP’s product lastmod issue (all identical), the <lastmod> signal provides no scheduling value to Google for products. GSC crawl rate is the only lever for Google product freshness.

Shopify Sitemap Quirks (Reference)

What this section shows. Three Shopify platform behaviors that affect sitemap output and cannot be fixed in theme code: query-string requirements on child sitemap URLs, auto-inserted <changefreq> tags, and dynamic product <lastmod> timestamps.

Platform-level limits need documented expectations. Spending engineering time fighting Shopify defaults is expensive; the right move is to capture each limit, state the workaround, and file a Shopify support ticket if any of the three ever becomes blocking. Future audit cycles then reference this section instead of re-discovering the same issues.

How to read the callouts. The first callout documents the query-string fetch requirement for anyone re-running the sitemap pull. The second names the three items that cannot be fixed without a custom sitemap solution, and is the reference document to cite in conversations with the NP IT team.

Query-string requirement on child sitemaps

Shopify sitemap URLs require ?from=X&to=Y query parameters or they return HTTP 400 from origin. The blog sitemap is the exception (bare path works). Always fetch the sitemap index first and extract the full URLs (with query strings) — don’t assume the bare path will work.

Cannot-fix items without custom sitemap

<changefreq> tags auto-inserted by Shopify — Google ignores them, but cannot be removed
Product <lastmod> is auto-generated at fetch time, not actual product modification date
Homepage leaks into products sitemap as first entry per locale

URL Inventory by Locale

Validation Checks

28 Pages to Remove (170 URL instances across 6 locales)

4a. Internal / B2B Pages (4 URLs × 6 = 24 instances)

4b. Compliance / Legal Boilerplate (7 URLs × 6 = 42 instances)

4c. Thin / Ghost / Expired Pages (17 URLs × 6 = 102 instances)

Additional: Homepage in Products Sitemap (Shopify bug)

Blog Sitemap (US Locale)

robots.txt + IndexNow Gaps

Shopify Sitemap Quirks (Reference)

Implementation Checklist