The Applied Layer / Pillars
Evaluation & Governance
How to know enterprise AI works, and how to ship it safely. Operational practice, not slogans.
Pillar 5 of 5 · 2 pieces filed
How to know enterprise AI works, and how to ship it safely. Operational practice, not slogans.
Key findings
- Evaluation in many organisations is still confused with model benchmarking, leaving production failure modes (hallucinated citations, prompt injection, drift, vendor liability) unmonitored until they appear in court records.
- Production failures (Moffatt v. Air Canada, Mata v. Avianca, EEOC v. iTutorGroup, Lacey v. State Farm, the DPD chatbot, Chevrolet of Watsonville) are governance failures more often than they are model failures.
- Three governance archetypes (Compliance-Led, Risk-Led, Engineering-Led) emerge from the public record, and a four-level combined maturity ladder describes the path to operational trust.
- Online evaluation matters more than offline evaluation once a system is live. The audit trail is part of the product, not a compliance afterthought.
- Lifecycle management for prompts is the same problem as lifecycle management for code, and the operating model that succeeds at one tends to succeed at the other.
From the anchor research
Filed under Evaluation & Governance
2 pieces filed under this pillar. Members read the body.
Evaluation and Governance
Title only. Become a Member to read.
How this pillar connects
- The Applied LayerExtends the trust dimension named in the manifesto.
- Architecture & RetrievalMethodology and governance here, technical infrastructure there.
- Operating Models & What Success Looks LikeOperational governance practice; the organisational dimension lives in operating models.
- Economics & Platform ChoiceEvaluation and governance costs are part of the cost stack.
