Dataset Lifecycle

Split	Count	Retrieval Failures	SQL Errors	Hallucinations	Clean
Train	20	15%	25%	10%	50%
Dev	90	16%	24%	11%	49%
Holdout	90	14%	26%	10%	50%
Original	200	15%	25%	10%	50%

Four-stage timeline: Creation → Development → Test Set Management → Retirement

Stage 1: Creation

Split BEFORE any development work. Train (10%), Dev (45%), Holdout (45%) — locked immediately.

Stage 2: Development

Iterate on train and dev only. Version everything (like code): v1.0 → v1.1 → v1.2. Holdout stays locked.

Stage 3: Test Set Management

Dev graduates to locked test set. Rotation begins: retire old examples, add fresh production samples. Monitor drift.

Stage 4: Retirement

Archive with full history when drift exceeds threshold or product changes. Read-only for reproducibility.

Five governing rules: split-first, version everything, protect holdout, rotate on cadence, detect contamination

Each transition has a decision gate — you don't skip stages or cycle backward