Browse lessons
Week 1: Foundations and Economics
Week 2: Instrumentation and Reliability Engineering
- 2.1 What to log and why it matters for AI evaluation
- 2.2 How little instrumentation is enough?
- 2.3 Trace design and reproducibility
- 2.4 The regression safety net — evaluation in CI/CD
- 2.5 How instrumentation requirements differ by system type
- 2.6 Designing instrumentation that scales with the product
Week 3: Rigorous Measurement of Output Success and Failure
- 3.1 Grounding evaluation in user value
- 3.2 Ground truth sources, regression suites, and synthetic data
- 3.3 Deriving evaluation signals from available ground truth
- 3.4 Similarity metrics and retrieval-specific metrics
- 3.5 Semantic metrics with human and model judges
- 3.6 Scaling semantic evaluation with statistical confidence
Week 4: Metric Design and Business Outcome Linkage
- 4.1 Metric strategy — blocking metrics vs optimization metrics
- 4.2 Metric design patterns for AI features
- 4.3 Metric validation — correlation to outcomes and sensitivity to change
- 4.4 Segmentation strategy for AI systems
- 4.5 Driver analysis — explaining variance and choosing what to change first
- 4.6 Metric specifications, thresholds, baselines, and release criteria
Week 5: Pipelines, Experiments, and Continuous Validation
- 5.1 Evaluation pipeline architecture and environments
- 5.2 Test set strategy and dataset lifecycle
- 5.3 Experiment design for stochastic systems
- 5.4 Launch readiness and rollout gates
- 5.5 Monitoring for drift and regressions
- 5.6 Building evaluation automation end-to-end
- 5.7 Capstone lab — evaluation pipeline build
Week 6: Decision-Making and Organization
Week 5: Pipelines, Experiments, and Continuous Validation · Lesson 5.1
Evaluation pipeline architecture and environments
How do we operationalize evaluation so it is repeatable, queryable, and tied to outcomes?
Retired course. Due to the fast pace of AI, this course was retired before full release. Exercises, datasets, and videos referenced in this lesson are not available. The slide content and frameworks remain free to study.
Go deeper with AI Analytics for Builders
5-week course: metrics, root cause analysis, experimentation, and storytelling. Think like a Product Data Scientist.
Book 1-on-1 with Shane
30-minute AI evals Q&A. Talk through your specific evaluation challenges and get hands-on guidance.
Finished all 36 lessons? Take the exam and get your free AI Evals certification.