AI Solution Architecture

Docs

View source

Projects

The projects convert the curriculum into architecture artifacts. Each project should produce diagrams, a decision log, a risk register, and a verification plan.

Use the reusable templates in the AI Solution Architecture Toolkit while completing each project.

Project Progression

flowchart TB P01[P01 Agent architecture comparison] --> P02[P02 Serving runtime selection] P02 --> P03[P03 RAG data plane] P03 --> P04[P04 Adaptation and training plan] P04 --> P05[P05 LLMOps and evaluation layer] P05 --> P06[P06 Capstone production AI platform]

P01: Agent Architecture Comparison

Build a design comparison for the same assistant use case across OpenAI Agents Python, LangChain/LangGraph, AutoGen, and LlamaIndex.

Deliverables:

Core decision: choose whether the product should be primarily an agent loop, workflow graph, multi-agent team, retrieval engine, or hybrid.

P02: Serving Runtime Selection

Design the serving layer for the chosen application. Compare Transformers, vLLM, llama.cpp, and a UI gateway such as Open WebUI.

Deliverables:

Core decision: choose the runtime that matches the deployment environment and traffic pattern.

P03: RAG Data Plane

Design retrieval using Qdrant or Chroma. Define ingestion, chunking, embedding versioning, collection layout, metadata filters, deletion/update semantics, and query routing.

Deliverables:

Core decision: choose the vector store and operating mode that fit durability, scale, developer velocity, and governance needs.

P04: Adaptation And Training Plan

Decide if the product needs training. Compare prompting, retrieval, PEFT adapters, and DeepSpeed distributed training.

Deliverables:

Core decision: decide whether the quality gap is data, orchestration, retrieval, adapter tuning, or full training.

P05: LLMOps And Evaluation Layer

Design tracing, scoring, feedback, datasets, experiment lineage, and promotion gates using Langfuse, Phoenix, MLflow, and TruLens.

Deliverables:

Core decision: define what evidence is required before the system can be called better.

P06: Capstone Production AI Platform

Design an end-to-end AI solution using all six layers. The recommended capstone is the Enterprise Knowledge Copilot for Architecture Review, because it exercises retrieval, tool governance, evaluation, traceability, and production readiness in one scenario.

flowchart LR UI[User interface / Open WebUI or app] --> Agent[Agent or workflow layer] Agent --> MCP[MCP tools and internal APIs] Agent --> RAG[RAG engine] Agent --> Runtime[Model serving runtime] RAG --> VectorDB[Qdrant or Chroma] Runtime --> Model[Base model + adapter] PEFT[PEFT / DeepSpeed training plan] --> Model Agent --> Obs[Langfuse / Phoenix / TruLens traces] Runtime --> Obs Obs --> MLflow[MLflow lineage and registry] Obs --> Gate[Promotion gate] Gate --> Release[Production release]

Deliverables:

Review Rubric

AreaPass Criteria
LayeringEach layer has a clear owner and boundary.
RuntimeServing choice matches capacity, latency, memory, and deployment constraints.
DataRAG data contract includes versioning, tenancy, deletion, and access policy.
EvaluationQuality is measured with datasets, traces, scores, and promotion gates.
SecurityTool execution, secrets, auth, data access, and audit logging are explicit.
OperationsHealth checks, incidents, rollback, cost, and ownership are defined.
EvidenceEvery architectural claim links to a repository deep dive or design artifact.