Transferithm — Rishabh Natarajan

What it is

Transferithm turns the noise of the transfer rumour mill into a single, source-graded feed you can follow one club at a time. Scheduled jobs scrape around 60 football outlets; an LLM extraction step keeps an article only if it names both a player and a destination club; a scoring layer (RITHM) grades each linked move from source tier (1 to 5), independent corroboration, and contract, age, fit, and timing signals. Every player gets a clear status — Rumoured, Deal Claimed, Near-Official, or Official.

It is built around the 2026 World Cup as the launch moment: the site also ranks World Cup players by tournament form (measured against others in the same position) and shows which standouts already carry transfer rumours — and whether the link predates the tournament or only appeared after a big game.

What it is not — the honesty firewall

Transferithm does not predict or forecast transfers, and it is not a betting or odds engine. It reports what credible sources have already said, attributes every claim, and then measures whether the move actually follows. It is a deterministic data pipeline, not an agent system.

The deliverable is signal plus a verifiable track record — not another rumour reposter, and not faux-precision. An early version surfaced a single headline score; it was removed once it proved partly circular (heavier coverage now drives more coverage later) and replaced with a public Track Record page that shows every figure alongside its sample size, thin numbers flagged rather than hidden.

Why it’s interesting as an Applied-AI project

Eval-first. No LLM stage ships without an eval. A bake-off proved an open-weight model matched the paid model on classification and beat it on extraction, so classification and extraction now run behind an eval-gated, per-use-case model router at roughly $0/mo.
Observability-first. Every LLM call is traced and costed (Langfuse). The cautionary tale — tracing silently off in production for months because of an unset env var — is the reason for a boot-time assert on telemetry connectivity.
Honesty as architecture. The one non-circular accuracy metric — lead time against an independent provider’s confirmed transfers — is what gets surfaced, with denominators always visible.

Stack

Next.js + TypeScript on Vercel, Supabase / Postgres with row-level security, an RSS-to-LLM extraction pipeline using LangChain for structured-output parsing (not agents), Langfuse for tracing, and Sportmonks as the independent ground-truth provider that confirms when a tracked rumour becomes a completed transfer.

Linked writing

Two case studies are queued under /writing/ — driving the LLM pipeline to ~$0 with an eval-first model router, and linking World Cup performance to transfer rumours honestly. Until they publish, this page is the canonical pointer.