Similarity + retrieval metrics

NDCG ranks highly relevant above partially relevant

Actual ranking

System retrieval order

Rank 1: relevance 2 (highly relevant)
Rank 2: relevance 0 (irrelevant)
Rank 3: relevance 1 (partial)
Rank 4: relevance 0 (irrelevant)
Rank 5: relevance 0 (irrelevant)

Ideal ranking

Perfect ordering (NDCG = 1.0)

Rank 1: relevance 2 (highly relevant)
Rank 2: relevance 1 (partial)
Rank 3: relevance 0 (irrelevant)
Rank 4: relevance 0 (irrelevant)
Rank 5: relevance 0 (irrelevant)

NDCG uses graded relevance + position discounting

Normalized Discounted Cumulative Gain. Highly relevant documents at rank 1 contribute more than at rank 5. Compare actual DCG to ideal DCG, normalize to get NDCG.

Product	Required Metric	Reason
Legal search tool	High recall@k	Missing a precedent is dangerous
QA chatbot	High MRR	First answer must be right
LLM context window (3 docs)	High precision@k	Every irrelevant document wastes tokens

Rank	Document ID	Relevance	Counts for P@3	MRR (1/rank)
1	doc_schema_users	2 (relevant)	Yes	1.000
2	doc_schema_sessions	0 (irrelevant)	No	--
3	doc_metrics_conversion	1 (partial)	Yes	--
4	doc_schema_desktop	0 (irrelevant)	--	--
5	doc_metrics_engagement	0 (irrelevant)	--	--

Similarity + retrieval metrics

Retrieval quality is a system signal — which signal category does it fall into, and why does that matter for when you measure it in the pipeline?

When retrieval fails, every downstream component inherits that failure

Similarity metrics measure closeness — retrieval metrics measure relevance

Precision@k: Of the top k retrieved, what fraction are relevant?

Recall@k: Of all relevant documents, what fraction appear in the top k?

MRR optimizes for "first good answer" scenarios

NDCG ranks highly relevant above partially relevant

Choose the metric based on your product's retrieval objective

What will precision@3 and recall@3 be for this retrieval result?

Precision@3 and recall@3 both equal 0.667 — but measure different properties

Retrieval metrics computed for a single query across 5 ranked documents

Build a Retrieval Quality Report for the AI Data Analyst

Retrieval Quality Report with aggregate metrics and adversarial segment analysis

Four ways teams misuse retrieval metrics

When precision and recall conflict, which metric wins?

Retrieval Evaluation: Metrics by Product Objective

Next: Semantic metrics with human and model judges