Production AI Architecture, Pillar 2 of 5

Key findings

Finding 1 — retrieval. Naive RAG fails in production for five predictable, documented reasons. Hybrid retrieval (BM25 + dense, fused with reciprocal rank fusion) plus cross-encoder reranking is the single highest-ROI architectural change available to most enterprise pipelines.
Finding 2 — agentic patterns. Agentic systems sit on a five-tier maturity ladder. Tiers 1–2 are broadly production-ready; Tiers 3–5 are workload-specific and frequently overclaimed in current practitioner discourse.
Finding 3 — anti-patterns. Six anti-patterns recur across documented production retrospectives — the evaluation cliff, debugging opacity, latency drift, integration impedance, the POC-to-production gap, and the verification vacuum — each with a diagnostic signature and a mitigation.
Finding 4 — decision framework. Pattern selection can be made almost deterministically given five workload characteristics: corpus characteristics, query complexity, latency budget, error tolerance, and verifiability surface.
Series link. These findings give the operating model (Pillar 3) its technical object, the cost analysis (Pillar 4) its cost drivers, and the governance practice (Pillar 5) its evaluation surface. The architecture is necessary but not sufficient — it reaches production only through the organisational conditions the later pillars examine.

Anchor research

Production AI Architecture

Patterns that distinguish production from demo — why the architecture wrapped around the model, not the model itself, decides enterprise AI outcomes.

Central thesis. By 2026 the frontier models have converged on capability for the median enterprise workload, so production AI quality is determined by architectural composition, not model selection. The same model in a well-architected system and in a naive pipeline produces materially different outcomes.

Finding 1 — retrieval. Naive RAG fails in production for five predictable, documented reasons. Hybrid retrieval (BM25 + dense, fused with reciprocal rank fusion) plus cross-encoder reranking is the single highest-ROI architectural change available to most enterprise pipelines.

Finding 2 — agentic patterns. Agentic systems sit on a five-tier maturity ladder. Tiers 1–2 are broadly production-ready; Tiers 3–5 are workload-specific and frequently overclaimed in current practitioner discourse.

Finding 3 — anti-patterns. Six anti-patterns recur across documented production retrospectives — the evaluation cliff, debugging opacity, latency drift, integration impedance, the POC-to-production gap, and the verification vacuum — each with a diagnostic signature and a mitigation.

Finding 4 — decision framework. Pattern selection can be made almost deterministically given five workload characteristics: corpus characteristics, query complexity, latency budget, error tolerance, and verifiability surface.

Series link. These findings give the operating model (Pillar 3) its technical object, the cost analysis (Pillar 4) its cost drivers, and the governance practice (Pillar 5) its evaluation surface. The architecture is necessary but not sufficient — it reaches production only through the organisational conditions the later pillars examine.

Read the report →

From the anchor research

Production AI Architecture

Filed under Production AI Architecture

2 pieces filed under this pillar. Patrons read the full body.

16 Jun 2026BriefingLocked
Executive briefing: Production AI Architecture
Title only. Become a Member to read.

Production AI Architecture

Key findings

From the anchor research

Filed under Production AI Architecture

Executive briefing: Production AI Architecture

How this pillar connects