| Over-segmentation |
20+ slices, some N < 10, confidence intervals too wide to trust |
Flag N < 30 as unreliable, merge sparse segments |
| Confusing correlation with causation |
Executives have lower quality — but they ask harder questions |
Check interaction effects, segment by dimension pairs |
| Ignoring segment frequency |
Multi-hop at 40% but only 2% of queries |
Rank by impact score (gap × frequency), not gap alone |
| Static analysis on shifting distributions |
Month-old data, user behavior changed |
Time-series decomposition to check stability |