The Applied Layer

The Applied Layer / Pillars

Evaluation & Governance

How to know enterprise AI works, and how to ship it safely. Operational practice, not slogans.

Pillar 5 of 5 · 2 pieces filed

How to know enterprise AI works, and how to ship it safely. Operational practice, not slogans.

Key findings

  • Evaluation in many organisations is still confused with model benchmarking, leaving production failure modes (hallucinated citations, prompt injection, drift, vendor liability) unmonitored until they appear in court records.
  • Production failures (Moffatt v. Air Canada, Mata v. Avianca, EEOC v. iTutorGroup, Lacey v. State Farm, the DPD chatbot, Chevrolet of Watsonville) are governance failures more often than they are model failures.
  • Three governance archetypes (Compliance-Led, Risk-Led, Engineering-Led) emerge from the public record, and a four-level combined maturity ladder describes the path to operational trust.
  • Online evaluation matters more than offline evaluation once a system is live. The audit trail is part of the product, not a compliance afterthought.
  • Lifecycle management for prompts is the same problem as lifecycle management for code, and the operating model that succeeds at one tends to succeed at the other.

From the anchor research

Filed under Evaluation & Governance

2 pieces filed under this pillar. Members read the body.