A/B testing is the scientific method applied to marketing. Instead of arguing about whether the green button or the red button will work better, you show both to comparable audiences and measure which actually converts more.
How A/B testing works
- Define the metric you care about (form submissions, purchases, clicks, signups)
- Make ONE change to ONE element (one button color, one headline, one form field count)
- Split incoming traffic 50/50 between original (control) and modified (variant)
- Run until you reach statistical significance — typically 95% confidence the difference isn't random
- Implement the winner, log the loser, design the next test
The traffic requirement
This is what kills most small-business A/B testing. To detect a 20% improvement in conversion rate with 95% confidence, you typically need:
- ~1,000 conversions per variation if base conversion rate is 5%
- ~3,000 conversions per variation if base conversion rate is 2%
- ~8,000 conversions per variation if base conversion rate is 1%
Doubled for two variations. So a site with 200 conversions/month testing a 1% conversion rate would need 4+ years of traffic to detect a 20% improvement with rigor. That's why true A/B testing is mostly useful for higher-traffic sites.
What lower-traffic sites should do instead
- Bayesian / sequential testing — alternatives to traditional frequentist A/B testing that can reach conclusions on smaller samples
- Before-after analysis — ship the change you believe is better, measure conversion over comparable time periods (less rigorous but practical)
- Qualitative research — user testing, heatmaps, session recordings, exit surveys often produce better insights than underpowered A/B tests
- Major-change tests only — test big changes (entirely new page structure) rather than tiny ones (button colors)
- Multivariate testing — test multiple combinations when traffic allows, more efficient than sequential A/B tests
Common A/B testing mistakes
- Stopping early — A/B tests look like they have winners after 200 visitors, but those "winners" almost always regress to control with more data
- Testing too many things at once — when you change 5 things and conversion improves, you don't know which change caused it
- Ignoring statistical significance — "B converted 12% better than A" means nothing without confidence intervals
- Testing during anomalies — holidays, sales events, traffic spikes throw off normal patterns
- Not segmenting — sometimes B wins for desktop, loses for mobile; aggregate result misleads
- Local maxima — iterating tiny improvements forever without testing bigger changes that could unlock larger gains
Tools
Common A/B testing platforms: Google Optimize (discontinued, replaced by GA4 + paid tools), VWO, Optimizely, Convert, AB Tasty. For ecommerce specifically: Shopify's native testing, Northbeam, Triple Whale.