Skip to content

AI Engineering Stack Map

This page is the mental map for the whole guide. The key idea is simple: many AI tools look similar because they all contain planning, execution, review, and iteration. They differ because they operate at different layers of the AI engineering stack.

If you compare tools at the wrong layer, every answer becomes fuzzy. LangGraph, Hermes, AWS AI-DLC, Spec Kit, and MCP can all appear in an "agentic" system, but they do not own the same problem.

The stack

mermaid
flowchart TB
    A[Business intent and product risk] --> B[Workflow and methodology]
    B --> C[Agent harness or coding runtime]
    C --> D[Agent app framework]
    D --> E[Tools and protocols]
    D --> F[Data, RAG, retrieval]
    E --> G[Model/provider/serving]
    F --> G
    H[Evals and observability] -. cross-cutting .-> B
    H -. cross-cutting .-> C
    H -. cross-cutting .-> D
    I[Security and governance] -. cross-cutting .-> B
    I -. cross-cutting .-> C
    I -. cross-cutting .-> D
    I -. cross-cutting .-> E

Layer map

LayerExamplesMain questionOutput
Workflow/methodologySpec Kit, OpenSpec, AWS AI-DLC, GSD, SuperpowersHow should humans and agents deliver software?Specs, plans, approvals, tests, reviews
Agent harness/runtimeCodex CLI, Claude Code, HermesHow does an agent run with tools, memory, filesystem, and subagents?Tool calls, code changes, terminal actions
Agent app frameworkLangChain, LangGraph, LlamaIndex, Semantic KernelHow do we build an AI application or long-running agent service?Chains, graphs, agents, state machines
Tool/protocol layerMCP, OpenAPI tools, function calling, tool gatewaysWhat can the model safely do?Tool schemas, policies, audit events
Data/RAG layerEmbeddings, vector DBs, retrievers, rerankersWhat knowledge does the AI use?Indexed content, retrieved context, citations
Model/serving layerOpenAI, Anthropic, local LLMs, vLLM, Ollama, LiteLLMWhich model runs inference, and how?Responses, tool calls, token/cost/latency profile
Evals/observabilityLangSmith, Langfuse, Phoenix, OpenTelemetryHow do we know the system behaves correctly?Traces, eval scores, regression results
Security/governanceRBAC, sandboxing, audit logs, approval gatesWhat is allowed, reviewed, and accountable?Policies, logs, approvals, risk records

Why the confusion happens

Different layers reuse similar verbs:

VerbIn workflow frameworksIn app frameworksIn harnesses
PlanPlan a feature or delivery unitPlan agent execution pathPlan terminal/file actions
ImplementGenerate code from specs/tasksExecute a node/chain/tool pathEdit files and run commands
ReviewReview spec, design, code, evidenceEvaluate outputs and tracesInspect diffs, tests, logs
IterateChange artifacts and implementationImprove prompts, tools, graphsRetry tasks with better context

The verbs are similar. The ownership is different.

This is why "plan -> implement -> review" is not enough to classify a framework. You must ask what artifact the framework owns, what risk it controls, and who is accountable for the result.

How to read this guide

  1. Use this page to identify the layer you are solving.
  2. Use the deep-dive pages to understand each framework.
  3. Use the comparison pages to choose a default workflow.
  4. Use the reference architectures to combine layers safely.

The shortest decision rule

ProblemStart here
Requirements are vagueSpec Kit or OpenSpec
Enterprise delivery needs audit and approvalAWS AI-DLC
Long project execution is losing contextGSD
Agents code without engineering disciplineSuperpowers
You need an open/custom agent harnessHermes
You are building a RAG or tool-calling AI appLangChain
You are building a stateful long-running agent appLangGraph
You need production assuranceEvals, observability, security, governance

Practical example

For a production support agent, the layers might look like this:

mermaid
flowchart TB
    A[AWS AI-DLC] --> B[Delivery governance and approval]
    C[OpenSpec] --> D[Change proposal for one agent capability]
    E[LangGraph] --> F[Stateful support-agent runtime]
    G[MCP/tool gateway] --> H[CRM, ticketing, knowledge-base tools]
    I[RAG layer] --> J[Policy and product documentation]
    K[Model router] --> L[Hosted and local models]
    M[LangSmith/Langfuse/Phoenix] --> N[Traces and evals]
    O[Security controls] --> P[Tool approval, audit, memory retention]

No single framework owns all of that. A mature team combines a small number of layers with clear boundaries.

References

Built as a static bilingual AI engineering stack guide.