← All lessons
Browse lessons

Week 1: Foundations and Economics · Lesson 1.3

Failure surfaces and annotation-based analysis

Where do AI systems break in practice, and how do we turn failures into structured evidence?

Retired course. Due to the fast pace of AI, this course was retired before full release. Exercises, datasets, and videos referenced in this lesson are not available. The slide content and frameworks remain free to study.

Slide 1 of 18

Reader Notes

Lesson 1.3 builds on two important foundations. In 1.1, non-determinism was quantified: the gap between what the system CAN do and what it DOES do consistently. In 1.2, the evaluation surface was mapped, covering every pipeline stage where failures could happen. But there is a gap between those two. Predictions were made about where failures COULD happen. The failures that ARE happening have not yet been systematically found. That is a significant difference. This lesson closes that gap. Raw traces (recorded end-to-end logs of user queries flowing through the AI Data Analyst v0) are examined to discover failure categories from scratch. No predefined checklist. No assumptions about what will be found. Just traces, observations, and patterns. By the end, the result is a failure taxonomy: a categorized inventory of every way the system is actually breaking, with a triage label that indicates what to do about each one.

Go deeper with AI Analytics for Builders

5-week course: metrics, root cause analysis, experimentation, and storytelling. Think like a Product Data Scientist.

Book 1-on-1 with Shane

30-minute AI evals Q&A. Talk through your specific evaluation challenges and get hands-on guidance.

Finished all 36 lessons? Take the exam and get your free AI Evals certification.