For site owners

About the GdFairyBot crawler

Spotted GdFairyBot in your server logs? Here's exactly who we are, what we do on your site, and how to turn us off in one line.

Last updated: May 2026

df

The short version. GdFairyBot is how datafairy checks that your measurement is actually firing — Step 1 of the journey. It reads your tracking, never your content. Today it only crawls our own site, and a single line in robots.txt stops it cold.

What is GdFairyBot?

GdFairyBot is the headless browser datafairy uses to audit how analytics and ad-tracking infrastructure (GTM, GA4, Meta Pixel, Google Ads, etc.) is wired on a website. We use it to verify that conversion events fire correctly, that consent gates work, that data flows to ad platforms the way the site owner expects — the foundation that everything datafairy does is built on.

Today GdFairyBot only crawls datafairy.ai's own website — our internal validation rig. It will eventually crawl customer sites that have explicitly opted in to scheduled monitoring as part of their datafairy subscription. We do not crawl arbitrary websites, follow links across domains, or attempt to discover sites without an existing relationship.

How to identify GdFairyBot

Every request from the bot includes:

  • User-Agent headerMozilla/5.0 (compatible; GdFairyBot/0.1; +https://datafairy.ai/crawler)
  • X-Datafairy-Crawl-Run-Id header — a UUID unique to each crawl run. If you spot something concerning in your logs and want to ask us about it, this is the value to send.

What we capture

  • HTML the browser renders (just like a real browser visit)
  • Network requests to analytics + ad-platform endpoints (GA4 /collect, GTM container loads, Meta Pixel, Google Ads, etc.) — captured as URLs only, not response bodies
  • dataLayer events and gtag calls fired during the page load
  • The result of clicking obvious passive UI affordances — accordions, tabs, disclosure widgets — so interaction-gated analytics tags fire

What we don't do

  • We never click checkout, purchase, or login buttons
  • We never attempt to bypass authentication or paywalls
  • We don't follow links across domains — each crawl is scoped to an explicit URL list
  • We don't store page HTML or content beyond what's needed to render the analytics audit
  • We don't index your content for search or any other purpose — we only inspect tracking infrastructure
  • We never submit synthetic test data to a form unless the domain has explicitly opted in to synthetic-flow validation. When opted in, every test submission uses unmistakable marker values (see below) so customers can filter test traffic out of their CRM.

Synthetic-flow validation (opt-in)

For domains that have explicitly opted in, GdFairyBot can also submit real test data through conversion-bearing forms (sign-up, demo request, lead capture, etc.) to verify end-to-end that your tracking actually fires when a real user submits. This is the difference between "your form has a tag attached" and "submitting your form actually triggers the conversion event in GA4 / Google Ads / Meta." Today this is restricted to datafairy.ai's own domain; per-customer opt-in lands later as part of customer-facing scheduled monitoring.

Every synthetic submission uses these marker values, and they are intentionally unmistakable so you can filter them out of your CRM, lead routing, and email sends:

  • Email: [email protected] — the .fairy top-level domain doesn't exist in the IANA root, so the address can never deliver mail. Easy to grep for.
  • Name: Robot Fairy (also Robot / Fairy for split first/last)
  • Phone: +15555550100 — a US fictional reserved number that can never ring a real person
  • Lead source / UTM: datafairy-crawl — surfaces as a dimension you can filter on
  • Notes / comments: include the literal string "datafairy.ai's crawler" and the Crawl run id, so you can correlate one CRM lead row to the exact crawl that produced it

Recommended customer setup before opting in: configure a CRM filter on any one of the markers above and verify it works. Salesforce workflow rule, HubSpot list, Mailchimp filter, Zapier filter step — whatever your stack uses to drop spam / internal-QA leads. Filter on email contains [email protected] or lead_source equals datafairy-crawl. Both work; either is sufficient.

How submissions are paced: at most 16 submissions per crawl across the whole domain, with at most 8 plans per form. We use equivalence-class fuzzing — one submission per branching-field combination (e.g. one per <select> option, not the cartesian product of every value) — and collapse duplicates as soon as we observe identical resulting tracking events. Most opt-in customers see 1–3 submissions per form, not 8.

The X-Datafairy-Crawl-Run-Id header is set on every HTTP request from a synthetic submission, including the form POST itself, so you can also correlate by HTTP log. Synthetic submissions are made from the same headless browser that does the rest of the audit; they are not separate processes.

How to opt out

Add this to your robots.txt to block GdFairyBot from your entire site:

User-agent: GdFairyBot
Disallow: /

Or block it from specific paths:

User-agent: GdFairyBot
Disallow: /private/
Disallow: /staging/

You can also slow it down:

User-agent: GdFairyBot
Crawl-Delay: 10

GdFairyBot fetches and respects robots.txt on every origin before any other request. We honor Disallow, Allow, and Crawl-Delay directives. Changes you make take effect within five minutes.

Rate of access

GdFairyBot is on-demand only — it does not run on a schedule. A typical crawl visits 1–10 pages on an opted-in customer's site once or twice a day, with at least a 1-second delay between requests (longer if your robots.txt sets Crawl-Delay). We are not a search-style crawler; we don't sweep your sitemap.

What if I see GdFairyBot on my site and I'm not a datafairy customer?

That shouldn't happen. As of mid-2026, the crawler is restricted to datafairy.ai's own URLs in code. If you spot the bot on a site you own and you don't have a datafairy subscription, please email us at [email protected] with the X-Datafairy-Crawl-Run-Id value from your logs and we'll investigate.

Contact

Questions, concerns, or want to verify a specific crawl? Email [email protected].