Scenario Lab: One Feature, Different Workflows
This page makes the differences concrete by applying each workflow to the same product scenario.
Scenario
Build a RAG support assistant for an internal SaaS operations team.
The assistant must:
- answer questions from approved product docs and runbooks;
- cite sources;
- refuse when the answer is not grounded;
- optionally create a draft incident ticket;
- log traces and eval results;
- support safe rollout behind a feature flag.
flowchart TB
U[Support operator] --> A[Assistant UI]
A --> R[Retriever]
R --> D[Approved docs and runbooks]
A --> L[LLM]
L --> T{Need ticket draft?}
T -->|Yes| G[Ticket tool with approval]
T -->|No| O[Grounded answer]
G --> O
O --> E[Evals, traces, audit]Same target architecture
The runtime architecture does not change much across workflows. What changes is the source of truth and the control surface.
| Layer | Suggested choice | Why |
|---|---|---|
| App framework | LangChain | Fast RAG composition, retriever/tool integration |
| Stateful orchestration | LangGraph if ticket workflow becomes multi-step | Add state, checkpoint, approval edge |
| Tool protocol | MCP or explicit tool gateway | Keep ticket tool permissions auditable |
| Workflow | Depends on scenario below | Controls requirements, risk, and delivery |
| Evals | Golden Q&A set + grounding checks | Prevent hallucinated support answers |
| Observability | Trace every retrieval, model call, and tool proposal | Debug quality and support audit |
Path 1: GitHub Spec Kit
Use Spec Kit when the main risk is vague requirements.
Step-by-step
- Write the feature spec: users, scope, data sources, refusal behavior, citation rules.
- Generate the implementation plan: UI, retrieval, prompt contract, ticket tool, evals.
- Break the plan into tasks: ingestion, retriever, prompt, tool policy, tests, docs.
- Implement only against the accepted spec.
- Review whether each requirement has tests or eval evidence.
Artifacts
| Artifact | Example content |
|---|---|
spec.md | "Assistant SHALL cite approved runbook source for each operational answer." |
plan.md | RAG architecture, model choice, retrieval strategy, rollout |
tasks.md | Task list mapped to requirements |
| eval report | Grounded answer rate, refusal correctness, source coverage |
Best fit
Product teams where business/product ambiguity is the biggest source of agent mistakes.
Path 2: OpenSpec
Use OpenSpec when the change is scoped and you want lightweight spec discipline.
Step-by-step
- Create a change proposal such as
add-support-rag-assistant. - Add delta specs for new capabilities: grounded answer, source citation, ticket draft.
- Define scenarios with
Given / When / Then. - Implement the minimal change.
- Validate the change and archive the proposal once adopted.
Artifacts
| Artifact | Example content |
|---|---|
| change proposal | Why the assistant is needed and what changes |
| delta spec | New support assistant capability requirements |
| validation notes | Test/eval evidence and rollout status |
Best fit
Small-to-mid teams that want SDD benefits without enterprise-level governance.
Path 3: AWS AI-DLC Workflows
Use AI-DLC when the assistant can affect customers, operations, regulated data, or high-risk decisions.
Step-by-step
- Classify AI behavior: user-facing, tool-using, data-sensitive, operational impact.
- Create risk register and required approvals.
- Define NFRs: latency, data retention, auditability, availability, safety.
- Require security review for document permissions and ticket tool side effects.
- Define eval gates and deployment evidence.
- Release behind feature flag with monitoring and rollback plan.
Artifacts
| Artifact | Example content |
|---|---|
| risk register | Hallucinated runbook step, unauthorized ticket creation, stale docs |
| approval record | Product, security, platform, operations |
| NFR checklist | Latency, retention, availability, traceability |
| audit evidence | Eval run, traces, approval decisions, deployment record |
Best fit
Enterprise teams where the cost of a wrong AI action is high.
Path 4: GSD
Use GSD when the work is long-running, multi-agent, or easy to lose across sessions.
Step-by-step
- Create a mission and phase plan.
- Build a context packet: repo map, data sources, existing support flows, constraints.
- Assign phases: discovery, ingestion, retriever, prompt/tool, eval, rollout.
- After each session, update handoff notes with decisions and remaining risks.
- Use the context packet to resume without re-discovering the project.
Artifacts
| Artifact | Example content |
|---|---|
| phase plan | Discovery -> RAG implementation -> eval -> rollout |
| context packet | Repo structure, docs inventory, tool API notes |
| handoff notes | What changed, what failed, what to do next |
Best fit
Long-running delivery where continuity matters more than formal approval gates.
Path 5: Superpowers
Use Superpowers when the agent needs stronger engineering discipline.
Step-by-step
- Brainstorm edge cases before implementation.
- Write a design note for retrieval, prompting, ticket tool policy, and evals.
- Write failing tests or eval cases first.
- Implement the smallest useful change.
- Run tests and inspect traces.
- Review the diff for risk, missing tests, and behavior drift.
Artifacts
| Artifact | Example content |
|---|---|
| design note | Retriever behavior, prompt contract, refusal policy |
| tests first | Source citation, no-answer refusal, ticket draft approval |
| review checklist | Edge cases, security, observability, docs |
Best fit
Any team using an AI coding agent that tends to move too fast without verification.
Combined best-practice stack
For this scenario, a pragmatic production stack is:
flowchart LR
A[OpenSpec proposal] --> B[Superpowers TDD]
B --> C[LangChain RAG implementation]
C --> D[Tool permission matrix]
D --> E[RAG eval checklist]
E --> F[Feature flag rollout]If the ticket tool can trigger real operational impact, upgrade governance:
flowchart LR
A[AI-DLC risk record] --> B[Spec or OpenSpec proposal]
B --> C[LangGraph approval edge]
C --> D[Tool gateway]
D --> E[Eval and audit evidence]
E --> F[Approved rollout]What this lab proves
The workflows are not just different branding around plan -> implement -> review.
| Framework | What changes in practice |
|---|---|
| Spec Kit | Requirements become the controlling artifact |
| OpenSpec | Change proposal and delta spec govern the work |
| AI-DLC | Risk, approval, and audit become first-class gates |
| GSD | Context continuity becomes the delivery backbone |
| Superpowers | Engineering discipline becomes explicit and repeatable |
| LangChain/LangGraph | Runtime behavior is implemented, not governed |
| Hermes | Agent execution is harnessed, not specified |