
The Misuse of Geo-Holdout Tests: A Guide for Non-Technical Marketing Leaders

I’ve written this article specifically for non-technical marketing leaders like Heads of Digital, CEOs, and Founders (here is my previous more technical article about this topic)
My goal is to help you avoid misusing geo-holdout tests, which are meant to assess whether a marketing action drives incremental results — beyond what would’ve happened anyway — not to precisely measure how much incrementality it delivers.
Let’s break it down.
What Are Geo-Holdout Tests?
Geo-holdout tests sound simple: you divide geographic regions into two groups:
- Control Group: Where your marketing campaign (e.g., YouTube ads) runs.
- Test Group: Where you intentionally withhold that campaign.
You then compare outcomes — sales, conversions, or whatever you’re measuring — between these groups to estimate the “lift” your campaign generates. Easy enough, right? But here’s the catch, and why this matters to you as a non-technical leader: this approach is often misused, leading to decisions that could cost your company dearly.
Why Geo-Holdout Tests Get Misused
Geo-holdout tests can point you in a general direction, but too many top managers treat them as a precise tool for measurement. Here’s why that’s a HUGE mistake:
- Regions Aren’t Twins No two regions are identical. Your test region might have a thriving economy while your control region struggles. Demographics — like age, income, or even weather — could differ too. If sales jump in the test region, is it your ads or just local conditions? It’s hard to tell.
- Cross-Contamination Across Channels Keeping a control group truly “ad-free” for one channel is nearly impossible today. If you withhold YouTube ads in California, your other channels — like Facebook or Google Ads — might automatically compensate, targeting that audience more aggressively. This doesn’t just muddy your control group; it can skew your tracking pixels and disrupt your broader marketing strategy. More on this below.
- In-Channel Contamination Even within the channel you’re testing, such as YouTube, excluding specific regions from targeting can introduce complications. For instance, if you withhold YouTube ads in California with a defined daily budget, that unused budget doesn’t simply disappear. Instead, the platform may redirect it to the remaining regions, including your control group in Texas. This can result in Texas receiving double the intended ad spend, significantly amplifying exposure there. Consequently, the results may appear inflated, suggesting the ads are more effective than they would be under normal conditions. This undermines the geo-holdout test’s integrity, as the control region no longer reflects a baseline scenario, leading to an overestimation of the campaign’s true impact.
- Not Enough Data to Work With Unlike user-level A/B tests with thousands of data points, geo-tests use just a handful of regions — think states or cities. This small sample size makes it tough to detect subtle but meaningful effects, especially with a limited budget.
- Life Happens External factors — holidays, local events, or seasonal trends — can hit one region harder than another. If your test group experiences a natural sales surge during the test, you might wrongly credit your ads.
- Reading Too Much Into It This is where I see leaders stumble most: treating geo-tests as definitive proof of a campaign’s value. They can suggest whether a channel adds value (yes or no), but they’re unreliable for pinpointing how much! Overrelying on them sets you up for risky calls.
The Margin of Error: A Real-World Wake-Up Call
Even a well-executed geo-test comes with a significant margin of error — something you, as a non-technical leader, need to understand before staking your budget on it.
Imagine your company spends $200,000 a month on YouTube ads, with total monthly revenue from all marketing at $10 million. A geo-test shows a 5% incremental lift from YouTube ads, with a ±4% margin of error (most of geo-holdout test providers hide this from you!).
Here’s what that means:
- 5% Lift: That’s $500,000 extra revenue (5% of $10M) from YouTube ads.
- ±4% Margin: The true lift could be anywhere from 1% ($100,000) to 9% ($900,000).
Now, let’s look at return on ad spend (ROAS), which you likely track closely:
- At 5% lift, ROAS is 2.5 ($500,000 revenue / $200,000 spend) — decent.
- At 1% lift, ROAS falls to 0.5 ($100,000 / $200,000) — you’re in the red.
- At 9% lift, ROAS climbs to 4.5 ($900,000 / $200,000) — a slam dunk.
The problem? This test confirms YouTube ads do something, but the range is so wide you can’t confidently set your budget. Treat that 5% as fact, and you might overinvest in a loser — or cut a winner short.
Most geo-holdout test providers go even further with misusing these tests — they apply a so-called “incrementality coefficient” to your ad platform reporting, which can screw your analytics even further, especially given the low numbers your platforms report and the week-to-week variance in those figures.
For example, if your YouTube platform attribution shows that YouTube contributes to 1% of your revenue, and your geo-holdout test shows a 5% incremental lift, some providers will apply a 5x coefficient to YouTube’s numbers, suggesting it actually drives 5% of revenue. In theory, this adjusts the platform’s reported contribution to align with the geo-test’s findings. But here’s why it’s dangerous, particularly when you’re playing with small numbers:
- Shaky Foundations: Geo-tests already have a wide margin of error — like that ±4% we discussed. A 5% lift could really be 1% or 9%. Multiplying YouTube’s 1% by 5x based on a potentially off-base estimate can massively overstate its impact — or understate it if the true lift is lower.
- Amplifying Volatility: Platform attribution numbers, like that 1%, are often tiny and fluctuate weekly due to seasonality, promotions, or random user behavior. If one week YouTube’s attribution spikes to 3% from a fluke, a 5x coefficient would claim a 15% contribution — wildly misleading when the next week it drops back down.
- Mixing Apples and Oranges: Geo-tests measure causal lift, while platform attribution often leans on correlation. Applying a coefficient from one to the other ignores this mismatch, distorting your view of what’s really driving results (for example, lift might have been caused by completely different campaigns that didn’t even have attribution-reported conversions).
This practice can trick you into overfunding a channel based on inflated stats or slashing one that’s actually pulling its weight. With small numbers, these errors don’t just add up —they multiply, throwing your strategy into chaos.
Consider the True Cost of Such Tests!
Running a geo-holdout test isn’t free — it can hit your revenue hard. For example, if you withhold ads in 50% of states for 21 days, and your ads truly drive a 9% lift on a monthly marketing mix revenue of $10M, here’s the cost:
Monthly revenue from ads: $10M. Without ads, revenue drops to $10M ÷ 1.09 ≈ $9.174M (since the 9% lift means revenue with ads is 109% of the no-ads baseline).
Incremental revenue from ads: $10M - $9.174M = $826,000 per month.
In 50% of states: $826,000 × 50% = $413,000. For 21 days (70% of a 30-day month): $413,000 × (21 ÷ 30) ≈ $289,000.
That’s roughly $289,000 in forgone incremental revenue — the true cost of this test. Use these tests only if you’re genuinely uncertain whether your ads are incremental at all.
Otherwise, you’re needlessly screwing your bottom line.
Key Takeaways for Top Managers
As a non-technical leader — whether a Founder, CEO, or Head of Digital — here’s how to use geo-tests wisely:
- Direction, Not Details: Use them to see if a channel’s worth pursuing (yes/no), not to nail down exact returns. And only when you are uncertain if your ads are incremental at all (like TV ads, Out-Of-Home ads, etc)
- Mind the Margin: A 5% lift with ±4% error means it could be 1% or 9% — account for that uncertainty (and this uncertainty is huge to even consider it for actual iROAS calculations!).
- Stay Humble: Don’t let one geo-test dictate your strategy. It’s a piece of the puzzle, not the whole picture. And beware from anyone telling you that this is a reliable method to measure incremental ROAS!
Wrapping Up
Geo-holdout tests can be useful when handled correctly, but they’re a minefield if you chase precision. My aim here is to equip you — non-technical marketing leaders — with the insight to recognize their limits.
Focus on whether a channel moves the needle, not how far, and you’ll guide your team toward smarter, safer decisions. Misuse these tests, and you’re gambling with your marketing budget.
Optimal marketing
Achieve the most optimal marketing mix with SegmentStream
Talk to expert