The Applied Layer

The Applied Layer / Pillars

Cost & Platform Landscape

What production actually costs in 2026 — and which platform fits your workload

Pillar 4 of 5 · 2 pieces filed

The headline cost of model inference is a small and shrinking fraction of what enterprises actually spend to run generative AI in production. The production evidence reviewed here consistently shows inference accounting for 20–40% of run-rate cost for mature deployments. Retrieval, evaluation, observability, governance, and human review consume the remainder — and are largely invisible at planning time.

Key findings

  • Finding 1 — the cost stack. Eight categories constitute the full enterprise-AI cost stack: model inference, retrieval infrastructure, evaluation, observability, governance, human-in-the-loop, integration, and change management. Most vendor calculators meter one.
  • Finding 2 — misleading headlines. Klarna, the IBM survey of cancelled pilots, and the BloombergGPT trajectory illustrate the same pattern: ROI claims built on visible costs against an invisible cost base. Not dishonesty — incomplete accounting.
  • Finding 3 — the global frontier. Chinese open-weight models — DeepSeek V3, Qwen3, Kimi K2 — have closed the benchmark gap on most non-frontier workloads to within procurement-relevant tolerances at substantially lower per-token list prices. This does not simplify platform selection; it shifts the decisive factors toward non-model dimensions.
  • Finding 4 — platform selection. Platform choice is governed by three forces — platform gravity (where data lives), identity gravity (where the workforce authenticates), and regulatory gravity (where data-residency commitments are binding). Raw capability comparison is the last determinant, not the first.

From the anchor research

Filed under Cost & Platform Landscape

2 pieces filed under this pillar. Patrons read the full body.

Cost & Platform Landscape, Pillar 4 of 5 · The Applied Layer