Browse lessons

Week 1: Foundations and Economics

Week 2: Instrumentation and Reliability Engineering

Week 3: Rigorous Measurement of Output Success and Failure

Week 4: Metric Design and Business Outcome Linkage

Week 5: Pipelines, Experiments, and Continuous Validation

Week 6: Decision-Making and Organization

Week 1: Foundations and Economics · Lesson 1.2

Product evaluation framework for AI systems

What is the full evaluation surface for an AI feature, from inputs to user outcomes?

Retired course. Due to the fast pace of AI, this course was retired before full release. Exercises, datasets, and videos referenced in this lesson are not available. The slide content and frameworks remain free to study.

Slide 1 of 18

Reader Notes

Lesson 1.2 builds on the previous lesson's finding that AI systems produce different outputs on every run. That variance was quantified with pass@k and reliable@k. But knowing the system is unreliable does not reveal WHERE it is unreliable. Is the SQL wrong? Is the retrieval pulling bad context? Is the narrative fabricating numbers? Problems that cannot be located cannot be fixed. This lesson provides the map. The goal is to build an evaluation surface map: a complete inventory of every way an AI feature can fail, every way an attacker could exploit it, and every blind spot in current testing. By the end, the resulting artifact answers two questions a PM will inevitably ask: "What are all the ways this can fail?" and "What haven't we tested yet?"

Go deeper with AI Analytics for Builders

5-week course: metrics, root cause analysis, experimentation, and storytelling. Think like a Product Data Scientist.

See full curriculum

Book 1-on-1 with Shane

30-minute AI evals Q&A. Talk through your specific evaluation challenges and get hands-on guidance.

Book 1-on-1 session

★

Finished all 36 lessons? Take the exam and get your free AI Evals certification.

→