Skip to content

AI Engineering Maturity Model

Maturity matters because different teams need different amounts of control. A solo prototype and a regulated enterprise agent should not use the same process.

The goal is not to reach the highest level immediately. The goal is to choose the right level for risk, team size, and production impact.

Why maturity matters

Low maturity is not always bad. It is appropriate for exploration. High maturity is not always good. It can become process drag when applied to low-risk work.

The correct question is:

What level of repeatability, evidence, and control does this AI workflow need?

Levels

mermaid
flowchart LR
    L0[0 Prompt-only] --> L1[1 Individual assisted coding]
    L1 --> L2[2 Team workflow discipline]
    L2 --> L3[3 Product AI app engineering]
    L3 --> L4[4 Platformed AI engineering]
    L4 --> L5[5 Governed enterprise AI-DLC]
LevelNameDescription
0Prompt-only experimentationAI use happens in ad hoc chats with little repeatability
1Individual assisted codingDevelopers use Codex/Claude/Cursor-style tools personally
2Team workflow disciplineSpecs, tests, reviews, and shared prompts become standard
3Product AI app engineeringRAG, tools, evals, observability, and CI gates exist
4Platformed AI engineeringShared model routing, tool gateways, policies, and templates exist
5Governed enterprise AI-DLCRisk-tiered governance, audit, approvals, SLOs, and operations loops exist

Capability matrix

CapabilityL0L1L2L3L4L5
Shared specsnooptionalyesyesstandardizedgoverned
Coding harnessnopersonalteam recommendedintegratedplatformedgoverned
Testsoptionaldeveloper-ownedrequiredrequiredpolicy-drivenaudited
RAG/data pipelinenonooptionalyesreusable platformgoverned
Tool gatewaynonooptionalper appsharedgoverned
EvalsnonobasicCI gateplatform serviceaudit evidence
Observabilitynobasic logsCI/test logstracesshared telemetrySLO and incident loop
Security governanceinformalpersonal judgmentteam checklistapp controlsplatform policyrisk-tiered AI-DLC
Team typeRecommended path
Solo builderLevel 0 -> Level 1, add OpenSpec only for larger changes
Startup product teamLevel 1 -> Level 2, add Spec Kit/OpenSpec and Superpowers discipline
SaaS team building RAGLevel 2 -> Level 3, add LangChain, RAG evals, observability
Platform engineering teamLevel 3 -> Level 4, add model router, tool gateway, templates
Regulated enterpriseLevel 3/4 -> Level 5, add AWS AI-DLC-style gates and audit

Signs you are over-engineering

  • Every typo fix requires a long AI-DLC flow.
  • The team writes specs no one reads.
  • Evals exist but do not represent user failures.
  • Model routing exists before there are multiple real model policies.
  • Tool gateway exists but only one safe read-only tool is used.
  • Developers bypass the process because it adds no useful evidence.

Signs you are under-engineering

  • Agents edit important code without tests.
  • RAG answers are trusted without retrieval evals.
  • Tool calls are not logged.
  • Sensitive data enters prompts without classification.
  • No one can explain which model handled a production incident.
  • Approvals happen in chat with no durable audit record.

Upgrade sequence

  1. Standardize specs and test discipline first.
  2. Add traces before scaling AI apps.
  3. Add evals before changing models/retrievers/prompts frequently.
  4. Add tool gateway before write-capable production tools.
  5. Add AI-DLC governance when work is high-risk or multi-stakeholder.

Built as a static bilingual AI engineering stack guide.