Reference Architectures
Reference architectures show how the layers combine in real systems. They are not product templates. They are decision maps: which layer owns which responsibility, which framework should lead, and where teams usually overbuild.
Architecture map
flowchart TB
A[Reference architecture choice] --> B[Team coding workflow]
A --> C[RAG product feature]
A --> D[Long-running agent app]
A --> E[Internal agent platform]
A --> F[Enterprise governed delivery]
B --> B1[Spec/OpenSpec + harness + discipline]
C --> C1[LangChain + RAG + evals]
D --> D1[LangGraph + AI-DLC + tool gateway]
E --> E1[Hermes + MCP + model router]
F --> F1[AI-DLC + gates + audit + ops readiness]Architecture 1: AI-assisted product engineering team
Recommended stack:
Spec Kit or OpenSpec + Codex/Claude/Hermes + Superpowers-style TDD + CI| Layer | Choice | Reason |
|---|---|---|
| Workflow | Spec Kit for larger features, OpenSpec for lightweight changes | keeps intent and implementation aligned |
| Harness | Codex CLI, Claude Code, or Hermes | performs coding, terminal, repo operations |
| Discipline | Superpowers-style TDD/review | prevents agent from coding without tests |
| Verification | CI tests and PR review | turns AI output into normal engineering evidence |
Use this when the main problem is software delivery quality, not building a production AI application.
Step-by-step:
- Define when a change needs Spec Kit vs OpenSpec.
- Define a minimal spec template with goals, non-goals, acceptance criteria, and risks.
- Require TDD or test-first prompts for behavior changes.
- Use the harness for implementation only after the spec/change artifact is reviewed.
- Require CI and human PR review before merge.
Architecture 2: Production RAG product feature
Recommended stack:
OpenSpec + LangChain + Data/RAG layer + evals/observability + CI eval gate| Layer | Choice | Reason |
|---|---|---|
| Change control | OpenSpec | lightweight proposal and spec delta |
| AI app framework | LangChain | fast RAG and tool orchestration |
| Data layer | ingestion, parsing, chunking, vector/hybrid search | RAG quality depends on data pipeline |
| Observability | LangSmith, Langfuse, Phoenix | traces and evals for retrieval/generation |
| CI gate | retrieval and answer evals | prevents prompt/retriever/model regressions |
Use this for chatbots, support assistants, documentation assistants, or knowledge search features.
Step-by-step:
- Define allowed sources and data owners.
- Create a golden dataset of user questions and expected evidence.
- Build a narrow RAG pipeline first.
- Add citations and refusal behavior.
- Add traces and evals before broad rollout.
- Add permission-aware retrieval before sensitive data.
Architecture 3: Long-running agent service
Recommended stack:
AI-DLC + LangGraph + tool gateway + evals + audit logs| Layer | Choice | Reason |
|---|---|---|
| Delivery governance | AWS AI-DLC | risk, approval, NFR, audit |
| Runtime app framework | LangGraph | stateful graph, human-in-the-loop, long-running execution |
| Tools | Tool gateway/MCP/OpenAPI | controlled external actions |
| Evaluation | node evals and trajectory evals | verifies state transitions and actions |
| Observability | traces and audit logs | production debugging and accountability |
Use this when the AI system performs multi-step work over time, needs memory/state, or can trigger external actions.
Step-by-step:
- Run AI-DLC inception for risk, stakeholders, NFRs, and approval model.
- Design LangGraph state and node boundaries.
- Classify tools and actions by risk.
- Gate write/destructive actions.
- Add node-level tests and full trajectory evals.
- Add traces, audit logs, and rollback/runbook procedures.
Architecture 4: Internal agent platform with custom harness
Recommended stack:
Hermes + model router + MCP/tool gateway + OpenSpec + Superpowers-like skills| Layer | Choice | Reason |
|---|---|---|
| Harness/runtime | Hermes | open/customizable agent runtime |
| Model layer | model router | hosted and local model control |
| Tool layer | MCP/tool gateway | standardized tool access and policy |
| Workflow | OpenSpec | lightweight change artifacts |
| Discipline | Superpowers-like skills | TDD, review, debugging, planning behavior |
Use this when you want to own the agent harness instead of depending only on managed coding CLIs.
Step-by-step:
- Define why Codex/Claude-style CLIs are not enough.
- Choose model routes by workload and data boundary.
- Add MCP/tool gateway before exposing internal systems.
- Pilot one repo with OpenSpec and a limited tool set.
- Add skills for planning, TDD, review, and debugging.
- Log tool calls and measure whether the custom harness improves outcomes.
Architecture 5: Enterprise AI-DLC delivery system
Recommended stack:
AWS AI-DLC + Spec Kit/OpenSpec patterns + security governance + release readiness + observability| Layer | Choice | Reason |
|---|---|---|
| Lifecycle | AWS AI-DLC | governs AI-driven delivery |
| Requirements | Spec Kit/OpenSpec concepts | clearer acceptance criteria and change deltas |
| Security | risk-tiered gates | prevents speed from outrunning accountability |
| Operations | release readiness and runbooks | closes the gap between construction and production |
| Observability | traces, CI, incident feedback | evidence for production behavior |
Use this when multiple stakeholders, high-risk systems, regulated domains, or platform teams need repeatable delivery governance.
Step-by-step:
- Define risk tiers.
- Define approval owners for product, architecture, security, operations.
- Define required artifacts per tier.
- Use Spec Kit/OpenSpec patterns inside AI-DLC artifacts for clarity.
- Add construction verification: tests, evals, security checks.
- Add operations verification: rollout, rollback, monitoring, incident feedback.
How to choose a reference architecture
| Main problem | Choose |
|---|---|
| Team wants better AI-assisted coding | Architecture 1 |
| Product needs RAG or knowledge assistant | Architecture 2 |
| Product needs stateful autonomous workflow | Architecture 3 |
| Platform team wants custom open agent runtime | Architecture 4 |
| Enterprise needs audit, approvals, NFRs, governance | Architecture 5 |
Combination rule
Start with one owner per layer. Do not combine two frameworks that both claim the same artifact unless you explicitly define precedence.
Example:
| Artifact | Owner |
|---|---|
| Requirement spec | Spec Kit or OpenSpec, not both for the same feature |
| Lifecycle approval | AI-DLC |
| Agent execution | Codex/Claude/Hermes |
| AI app runtime | LangChain or LangGraph |
| Tool permissions | Tool gateway |
| Production proof | Evals and observability |