Topic

Evaluation & Observability

Eval frameworks, online evaluation, and observability stacks for LLM-backed systems running in real environments.

2 May 2026Briefing

Evaluation and Governance

How to know enterprise AI works, and how to ship it safely. Operational practice, not slogans.

2 min readEvaluation & Observability Governance & Risk Operating Models

2 May 2026Briefing

Architecture and Retrieval

The patterns that distinguish production AI from demos.

2 min readArchitecture & Patterns RAG & Retrieval Evaluation & Observability

2 May 2026Research

Trust in Enterprise AI: Evaluation as Practice, Governance as Delivery

Two disciplines determine whether enterprise AI earns operational trust: evaluation, the practice of measuring whether a system actually works in production; and governance, the delivery of policy as code, controls, and accountable workflows. Both remain underspecified. Evaluation in many organizat

46 min readEvaluation & Observability Governance & Risk Operating Models Case Studies

2 May 2026Research

Production AI Architecture: Patterns That Distinguish Production from Demo

By 2026, enterprise AI systems are no longer differentiated primarily by which large language model they use. The frontier models, Anthropic's Claude Opus 4.7, OpenAI's GPT-5.2, Google's Gemini 3 Pro, are converging on capability for the median enterprise workload. What separates production-grade s

41 min readArchitecture & Patterns RAG & Retrieval Evaluation & Observability Agents & Agentic Workflows

2 May 2026Research

The First Year of the Applied Layer: A Synthesis

The most consequential layer of the AI buildout is not the foundation models themselves but what sits between them and the organizations that deploy them: architecture, integration, evaluation, and governance. The public record has clarified the picture rather than settled it. The applied layer is

43 min readArchitecture & Patterns production Governance & Risk Evaluation & Observability Agents & Agentic Workflows

2 May 2026BriefingMember

Morgan Stanley's evaluation framework, from 7,000 questions to 100,000 documents

Morgan Stanley shipped two assistants in eighteen months. The visible artefact in both cases was the model. The invisible artefact, the part that decided whether the rollouts compounded, was the evaluation harness underneath.

4 min readEvaluation & Observability Case Studies Governance & Risk