The Applied Layer / Pillars
Cost & Platform Landscape
What production actually costs in 2026 — and which platform fits your workload
Pillar 4 of 5 · 2 pieces filed
The headline cost of model inference is a small and shrinking fraction of what enterprises actually spend to run generative AI in production. The production evidence reviewed here consistently shows inference accounting for 20–40% of run-rate cost for mature deployments. Retrieval, evaluation, observability, governance, and human review consume the remainder — and are largely invisible at planning time.
Key findings
- Finding 1 — the cost stack. Eight categories constitute the full enterprise-AI cost stack: model inference, retrieval infrastructure, evaluation, observability, governance, human-in-the-loop, integration, and change management. Most vendor calculators meter one.
- Finding 2 — misleading headlines. Klarna, the IBM survey of cancelled pilots, and the BloombergGPT trajectory illustrate the same pattern: ROI claims built on visible costs against an invisible cost base. Not dishonesty — incomplete accounting.
- Finding 3 — the global frontier. Chinese open-weight models — DeepSeek V3, Qwen3, Kimi K2 — have closed the benchmark gap on most non-frontier workloads to within procurement-relevant tolerances at substantially lower per-token list prices. This does not simplify platform selection; it shifts the decisive factors toward non-model dimensions.
- Finding 4 — platform selection. Platform choice is governed by three forces — platform gravity (where data lives), identity gravity (where the workforce authenticates), and regulatory gravity (where data-residency commitments are binding). Raw capability comparison is the last determinant, not the first.
From the anchor research
Filed under Cost & Platform Landscape
2 pieces filed under this pillar. Patrons read the full body.
Executive briefing: Cost & Platform Landscape
Title only. Become a Member to read.
How this pillar connects
- Beyond the ModelQuantifies why the applied layer, not the token price alone, determines production economics.
- Production AI ArchitectureMaps architecture choices to latency, retrieval, observability, and evaluation costs.
- Operating ModelsShows how centralised or federated models change platform gravity, accountability, and run-rate spend.
- Trust, Evaluation & GovernanceLinks cost discipline to evidence: monitoring, controls, review effort, and audit-ready records.
