The Applied Layer / Pillars
Trust, Evaluation & Governance
Evaluation as practice, governance as delivery — the two disciplines that decide whether enterprise AI earns operational trust
Pillar 5 of 5 · 2 pieces filed
Enterprise AI systems fail in production in characteristic ways. The pattern common to the documented failures reviewed here — Air Canada, Avianca, iTutorGroup, State Farm, DPD, Chevrolet of Watsonville — is the absence of two operational disciplines: evaluation (the ongoing measurement of whether systems perform as intended) and governance (the delivery of policy as code, controls, and accountable workflows).
Key findings
- Central argument. Evaluation and governance are not parallel disciplines; they are the same operational system viewed from different angles. Evaluation gates are governance components; drift monitoring is governance evidence. The attempt to govern without operational evaluation produces documents; the attempt to evaluate without governance produces dashboards no one acts on.
- Evaluation (Part A). Seven evaluation dimensions — correctness, faithfulness, relevance, safety, latency, cost, and business outcome — must all be measured. Five methods — golden datasets, LLM-as-judge, human-in-the-loop, online evaluation, and adversarial red-teaming — each address different dimensions and failure modes. Observability is the prerequisite for all of them.
- Governance (Part B). Seven operational components constitute a working governance system. Three archetypes — Compliance-Led, Risk-Led, Engineering-Led — each succeed in some contexts and fail in others; the best operational pattern is Engineering-Led implementation, Risk-Led prioritisation, and Compliance-Led disclosure.
- Maturity (Part C). A four-level combined maturity framework provides an operational yardstick. Most enterprise AI programmes in 2026 sit at Level 1 or low Level 2 — even organisations at Level 3–4 for traditional software delivery. The capability gap is real.
From the anchor research
Filed under Trust, Evaluation & Governance
2 pieces filed under this pillar. Patrons read the full body.
Executive briefing: Trust, Evaluation & Governance
Title only. Become a Member to read.
How this pillar connects
- Beyond the ModelProvides the trust discipline that makes the applied layer safe enough for operational use.
- Production AI ArchitectureDepends on architecture that exposes traces, test surfaces, guardrails, and rollback paths.
- Operating ModelsRequires clear ownership so evaluation failures trigger delivery action rather than passive reporting.
- Cost & Platform LandscapeMakes hidden costs visible through continuous measurement, human review, and compliance evidence.
