AI Solution Architecture

Docs

View source

Repository Atlas

Use this atlas when selecting libraries or explaining how the 17 repositories fit into a complete AI architecture.

System-Level Placement

flowchart TB subgraph App[AI app and agent architecture] OA[OpenAI Agents Python] LC[LangChain] AG[AutoGen] LI[LlamaIndex] end subgraph Serving[Model serving and inference] HF[Transformers] VLLM[vLLM] LCPP[llama.cpp] end subgraph Train[Training and adaptation] PEFT[PEFT] DS[DeepSpeed] end subgraph RAG[RAG and vector data] QD[Qdrant] CH[Chroma] end subgraph Ops[Observability and LLMOps] LF[Langfuse] PX[Phoenix] MF[MLflow] TL[TruLens] end subgraph Platform[Tools and platform] MCP[MCP Servers] OW[Open WebUI] end App --> Serving App --> RAG Train --> Serving App --> Ops Serving --> Ops RAG --> Ops Platform --> App Platform --> Ops

Repository Matrix

RepositoryPrimary RoleUse WhenWatch For
OpenAI Agents PythonAgent runtime with tools, handoffs, guardrails, tracingYou want a focused agent SDK with explicit handoff/tool semanticsTool permissions, guardrail coverage, trace completeness
LangChainComposable app framework and workflow ecosystemYou need chains, retrievers, tools, model abstraction, LangGraph workflowsOver-composition, unclear state boundaries, dependency sprawl
AutoGenMulti-agent framework with Core, AgentChat, extensionsYou need role-based agent collaboration or existing AutoGen assetsMaintenance-mode implications, code execution risk, extension governance
LlamaIndexData-centric agent and retrieval frameworkKnowledge ingestion, indices, query engines, RAG workflowsChunking quality, index freshness, retrieval confidence
TransformersModel API and compatibility backboneModel experimentation, tokenizer/model loading, pipelines, training utilitiesRuntime performance, artifact trust, remote code, memory use
vLLMHigh-throughput LLM serving runtimeToken throughput, concurrent serving, OpenAI-compatible endpointsCapacity planning, scheduler behavior, model support, GPU memory
llama.cppLocal and edge inference runtimeCPU/edge/local serving, quantized models, portable binariesQuantization quality, context limits, API exposure, model conversion
PEFTParameter-efficient fine-tuningDomain/task adaptation with adapters instead of full fine-tuningAdapter compatibility, unsafe artifacts, evaluation before promotion
DeepSpeedDistributed training optimizationLarge training jobs, ZeRO, memory partitioning, checkpoint scaleCluster reliability, optimizer state, checkpoint recovery
QdrantVector database with strong search and operations modelDurable vector search, payload filtering, sharding, distributed operationWAL/segment recovery, filter correctness, tenancy boundaries
ChromaDeveloper-friendly vector database and RAG storeLocal/server RAG development and Python-first workflowsMode selection, persistence settings, distributed maturity
LangfuseLLM tracing, prompt, dataset, evaluation, feedback platformProduct teams need trace and score visibilityPII retention, project isolation, ClickHouse/Postgres operations
PhoenixLLM observability and evaluationTrace analysis, datasets, annotations, evaluatorsAuth, evaluator safety, trace volume, database isolation
MLflowExperiment tracking, model registry, artifactsYou need ML lifecycle lineage across experiments and modelsArtifact access, auth, registry policy, tracking server security
TruLensFeedback functions and LLM app evaluationYou need groundedness, relevance, and app-level eval checksEval cost, feedback calibration, metric misuse
MCP ServersReference tool server patternsYou need tool contracts between models/clients and external systemsLeast privilege, schema quality, sandboxing, audit logs
Open WebUISelf-hosted AI workspace and provider gatewayYou need a UI, model routing, RAG, tools, admin controlsAdmin boundary, tool execution, provider secrets, CORS/auth

Decision Guide

flowchart TB Start[Architecture question] --> AppQ{Need app orchestration?} AppQ -->|Yes| AgentChoice{Main control model} AgentChoice -->|Agent SDK| OA AgentChoice -->|Workflow graph| LC AgentChoice -->|Multi-agent team| AG AgentChoice -->|RAG/query engine| LI Start --> RuntimeQ{Need model runtime?} RuntimeQ -->|Compatibility| HF RuntimeQ -->|Throughput| VLLM RuntimeQ -->|Local/edge| LCPP Start --> DataQ{Need knowledge retrieval?} DataQ -->|Operational vector DB| QD DataQ -->|Developer-friendly RAG store| CH Start --> EvalQ{Need evidence loop?} EvalQ -->|LLM traces| LF EvalQ -->|Observability/evals| PX EvalQ -->|Experiment lineage| MF EvalQ -->|Feedback metrics| TL

Cross-Cutting Production Risks

RiskAppears InArchitecture Response
Tool side effectsAgents, AutoGen, MCP servers, Open WebUIUse explicit schemas, approvals, audit logs, sandboxing, and least privilege.
Model artifact trustTransformers, PEFT, llama.cpp, vLLMPin artifacts, prefer safe formats, review remote code, track provenance.
Retrieval driftLlamaIndex, LangChain, Qdrant, ChromaVersion embeddings, chunks, metadata, and query configs; evaluate retrieval separately.
Trace and PII exposureLangfuse, Phoenix, TruLens, MLflowRedact inputs, define retention, isolate tenants, encrypt secrets.
Serving overloadvLLM, llama.cpp, Open WebUIAdd admission control, capacity metrics, scaling policy, fallback routing.
Training irreproducibilityPEFT, DeepSpeed, MLflowTrack dataset, seed, config, checkpoint, adapter, tokenizer, and evaluation run.

How To Use The Atlas In Design Reviews

  1. Start from the product workflow, not from a favorite library.
  2. Assign each requirement to a layer.
  3. Use the matrix to identify candidate repositories.
  4. Check the deep-dive docs for source tree, extension points, and failure modes.
  5. Document why each rejected alternative was rejected.
  6. Define the evidence required to revisit the decision later.