Week 4 Lesson 4 AI Evals for Product Dev Shane Butler AI Analyst Lab
Aggregate SQL correctness: 88%.
Segments: simple (n=200, 95%), multi-join (n=150, 84%), advanced (n=50, 62%).
PM says '88% is above our 85% threshold, let's ship.'