← All lessons
Browse lessons

Week 2: Instrumentation and Reliability Engineering · Lesson 2.1

What to log and why it matters for AI evaluation

What evidence do I need to capture so evaluation is possible and decisions are auditable?

Retired course. Due to the fast pace of AI, this course was retired before full release. Exercises, datasets, and videos referenced in this lesson are not available. The slide content and frameworks remain free to study.

Slide 1 of 17

Reader Notes

Week 2, Lesson 1 opens with a question every product team eventually faces: "What data do you need us to log?" Last week covered failure mode discovery in the AI Data Analyst, tracing through all the ways it could break. This week answers the instrumentation question: what needs to be logged to make evaluation possible? The stakes are high. This is the contract between product, evaluation, and engineering: the agreement about exactly which fields get logged. Get it right and measurement is unlocked. Get it wrong and Week 3's exercises will fail because the data doesn't exist. Engineering implements this in one sprint, but it's not reversible. The traces not logged today are gone forever. This is the lesson that separates "we should evaluate" from "we CAN evaluate." Evaluation, in this context, means measuring whether an AI feature is actually working, which is what this entire course builds toward.

Go deeper with AI Analytics for Builders

5-week course: metrics, root cause analysis, experimentation, and storytelling. Think like a Product Data Scientist.

Book 1-on-1 with Shane

30-minute AI evals Q&A. Talk through your specific evaluation challenges and get hands-on guidance.

Finished all 36 lessons? Take the exam and get your free AI Evals certification.