2026 · 06 · 21 Cutting two high-volume LLM stages to roughly $0 by treating cost as an eval problem — build the measurement first, prove a free open-weight model matches (and on extraction beats) the paid one, then ship an eval-gated router. Plus the honest sequel: the $0 swap that quietly froze ingestion for a week, and what the eval never measured.
case study · 16 min 2026 · 06 · 16 The marketing question was 'does a great World Cup actually move a player's transfer market?' The naive answer — infer the link ourselves — is unprovable from observational data and would break the product's honesty firewall. The honest move was to stop inferring and instead detect where a journalist already stated the link, attribute it, and measure whether a move follows. A measurement-first pass showed a real-but-small signal (n=9), which is exactly why the next step is to capture labelled data, not re-measure a tiny sample.
case study · 11 min