← All lessons
Browse lessons

Week 2: Instrumentation and Reliability Engineering · Lesson 2.2

How little instrumentation is enough?

When is input/output logging enough, and when do I need deeper traces?

Retired course. Due to the fast pace of AI, this course was retired before full release. Exercises, datasets, and videos referenced in this lesson are not available. The slide content and frameworks remain free to study.

Slide 1 of 18

Reader Notes

Week 2, Lesson 2 addresses the question nobody asks explicitly but everybody is thinking: how many test queries are needed before confidence is sufficient to ship? This matters because every sample costs time and money. If a subject matter expert is writing oracle SQL and verifying outputs, that's real budget and real hours. But the alternative, shipping with insufficient evidence, means genuinely not knowing if an AI feature is at 75% quality or 85% quality. And that difference matters. 75% means one in four queries is wrong. 85% means roughly one in seven. Those are very different user experiences. Most teams guess. They say "let's run a hundred" or "fifty sounds good." No math behind it. Just vibes. This lesson replaces vibes with a formula. By the end, the minimum viable sample size for a specific decision at a required confidence level will be calculable. Exactly how much evidence is needed, and not a sample more.

Go deeper with AI Analytics for Builders

5-week course: metrics, root cause analysis, experimentation, and storytelling. Think like a Product Data Scientist.

Book 1-on-1 with Shane

30-minute AI evals Q&A. Talk through your specific evaluation challenges and get hands-on guidance.

Finished all 36 lessons? Take the exam and get your free AI Evals certification.