Scaling instrumentation

Metric	Value
Queries per day	100,000
Traces per month (full logging)	3,000,000
All-in cost per trace (managed platform)	$0.02
Monthly cost (full logging)	$60,000

Decision Layer	Constraint	Example Policy	Cost Impact
Trace sampling (what to store)	Platform cost	10% sampling	$6,000/month platform
Eval sampling (which stored traces to run quality checks on)	Compute cost	50% of stored traces	$5/day automated check calls

Failure Mode	Base rate (how often it occurs)	Samples Needed (90% confidence)	Min % of queries to log
Retrieval errors	15%	50	0.05%
Hallucination	15%	50	0.05%
SQL syntax	12%	60	0.06%
Context loss	2%	350	0.35%
Security violations	1%	450	0.45%
Numeric overflows	0.5%	500	0.5%

Cost-volume-quality frontier shows 10% as the operating point

$60K

<!-- Y-axis labels (right: coverage) -->
<div style="position: absolute; right: 0; top: 0; font-size: 14px; color: var(--dk-text-muted);">100%</div>
<div style="position: absolute; right: 0; bottom: 0; font-size: 14px; color: var(--dk-text-muted);">0%</div>

<!-- Chart area -->
<div style="margin: 0 60px; height: 280px; position: relative; border-left: 1px solid var(--dk-border); border-bottom: 1px solid var(--dk-border);">
  <!-- Storage cost line (red) -->
  <svg style="position: absolute; width: 100%; height: 100%;">
    <line x1="0" y1="280" x2="100%" y2="0" stroke="var(--dk-negative)" stroke-width="3" opacity="0.8"/>
  </svg>

  <!-- Detection coverage line (green) -->
  <svg style="position: absolute; width: 100%; height: 100%;">
    <path d="M 0 280 Q 10% 100, 20% 20 L 100% 10" stroke="var(--dk-positive)" stroke-width="3" fill="none" opacity="0.8"/>
  </svg>

  <!-- Operating point marker -->
  <div style="position: absolute; left: 10%; bottom: 75%; width: 20px; height: 20px; background: var(--dk-accent); border-radius: 50%; border: 3px solid var(--dk-bg); box-shadow: 0 0 0 2px var(--dk-accent), 0 0 16px rgba(217,119,6,0.4);"></div>
  <div style="position: absolute; left: 12%; bottom: 77%; font-size: 15px; color: var(--dk-accent-light); font-weight: 700; white-space: nowrap;">10% sampling</div>
  <div style="position: absolute; left: 12%; bottom: 70%; font-size: 14px; color: var(--dk-text-secondary); white-space: nowrap;">$6,000/month, 95% coverage</div>

  <!-- X-axis label -->
  <div style="position: absolute; bottom: -24px; left: 0; font-size: 14px; color: var(--dk-text-muted);">0%</div>
  <div style="position: absolute; bottom: -24px; right: 0; font-size: 14px; color: var(--dk-text-muted);">100%</div>
  <div style="position: absolute; bottom: -40px; left: 50%; transform: translateX(-50%); font-size: 14px; color: var(--dk-text-secondary);">Sampling Rate</div>
</div>

<!-- Legend -->
<div style="margin-top: 16px; display: flex; gap: 32px; justify-content: center;">
  <div style="display: flex; align-items: center; gap: 8px;">
    <div style="width: 24px; height: 3px; background: var(--dk-negative);"></div>
    <span style="font-size: 14px; color: var(--dk-text-secondary);">Platform cost</span>
  </div>
  <div style="display: flex; align-items: center; gap: 8px;">
    <div style="width: 24px; height: 3px; background: var(--dk-positive);"></div>
    <span style="font-size: 14px; color: var(--dk-text-secondary);">Detection coverage</span>
  </div>
</div>

Feasible: meets detection + cost constraints

Retention Strategy	Traces stored	Monthly cost	Information preserved
90-day full retention (10% sampled)	300K traces/month	$6,000	100% — all sampled traces
Tiered: 7d full detail / 90d summary stats / forever: trends only	~120K trace-equivalents	$2,400	95% — forensic detail for 7d, metrics for 90d, trends indefinitely

Strategy	Evidence	Cost	Operations
Full logging (100%)	High	Fail ($60K+)	Simple
10% uniform	Medium (90%+)	Pass ($6K)	Simple
Dynamic 5-25%	Medium (90%+)	Pass ($10.5K)	Complex

Scaling instrumentation

If you deployed your L2.3 trace spec to 100K queries/day, what cost and operational challenges would you face?

100K queries/day × $0.02/trace = $60,000/month before evaluators

The Sampling Strategy Design Triangle constrains what's feasible

Detection requirements: rare failures drive minimum sample size

Cost constraints: budget determines max sampling rate

Operational constraints: logging can't degrade latency

Dynamic sampling allocates budget to high-risk periods

Trace sampling and eval sampling are two separate decisions

What minimum sampling rate detects a 0.5% failure with 90% confidence?

The rarest failure mode (0.5%) requires at least 8% sampling

Cost-volume-quality frontier shows 10% as the operating point

Tiered retention (7d/90d/indefinite) cuts costs 60% with minimal loss

Design a dynamic sampling policy with 3 risk-triggered conditions

Sampling Strategy Specification: production-ready config for your infra team

Sampling without detection analysis leaves critical failures invisible

0.3% failure + 5% budget: sample uniform, stratify, increase budget, or accept blindness?

The triangle: evidence, cost, operations — feasible strategy at center

Next: Grounding evaluation in user value