Docs

Repository Atlas

Use this atlas when selecting libraries or explaining how the 17 repositories fit into a complete AI architecture.

System-Level Placement

flowchart TB subgraph App[AI app and agent architecture] OA[OpenAI Agents Python] LC[LangChain] AG[AutoGen] LI[LlamaIndex] end subgraph Serving[Model serving and inference] HF[Transformers] VLLM[vLLM] LCPP[llama.cpp] end subgraph Train[Training and adaptation] PEFT[PEFT] DS[DeepSpeed] end subgraph RAG[RAG and vector data] QD[Qdrant] CH[Chroma] end subgraph Ops[Observability and LLMOps] LF[Langfuse] PX[Phoenix] MF[MLflow] TL[TruLens] end subgraph Platform[Tools and platform] MCP[MCP Servers] OW[Open WebUI] end App --> Serving App --> RAG Train --> Serving App --> Ops Serving --> Ops RAG --> Ops Platform --> App Platform --> Ops

Repository Matrix

Repository	Primary Role	Use When	Watch For
OpenAI Agents Python	Agent runtime with tools, handoffs, guardrails, tracing	You want a focused agent SDK with explicit handoff/tool semantics	Tool permissions, guardrail coverage, trace completeness
LangChain	Composable app framework and workflow ecosystem	You need chains, retrievers, tools, model abstraction, LangGraph workflows	Over-composition, unclear state boundaries, dependency sprawl
AutoGen	Multi-agent framework with Core, AgentChat, extensions	You need role-based agent collaboration or existing AutoGen assets	Maintenance-mode implications, code execution risk, extension governance
LlamaIndex	Data-centric agent and retrieval framework	Knowledge ingestion, indices, query engines, RAG workflows	Chunking quality, index freshness, retrieval confidence
Transformers	Model API and compatibility backbone	Model experimentation, tokenizer/model loading, pipelines, training utilities	Runtime performance, artifact trust, remote code, memory use
vLLM	High-throughput LLM serving runtime	Token throughput, concurrent serving, OpenAI-compatible endpoints	Capacity planning, scheduler behavior, model support, GPU memory
llama.cpp	Local and edge inference runtime	CPU/edge/local serving, quantized models, portable binaries	Quantization quality, context limits, API exposure, model conversion
PEFT	Parameter-efficient fine-tuning	Domain/task adaptation with adapters instead of full fine-tuning	Adapter compatibility, unsafe artifacts, evaluation before promotion
DeepSpeed	Distributed training optimization	Large training jobs, ZeRO, memory partitioning, checkpoint scale	Cluster reliability, optimizer state, checkpoint recovery
Qdrant	Vector database with strong search and operations model	Durable vector search, payload filtering, sharding, distributed operation	WAL/segment recovery, filter correctness, tenancy boundaries
Chroma	Developer-friendly vector database and RAG store	Local/server RAG development and Python-first workflows	Mode selection, persistence settings, distributed maturity
Langfuse	LLM tracing, prompt, dataset, evaluation, feedback platform	Product teams need trace and score visibility	PII retention, project isolation, ClickHouse/Postgres operations
Phoenix	LLM observability and evaluation	Trace analysis, datasets, annotations, evaluators	Auth, evaluator safety, trace volume, database isolation
MLflow	Experiment tracking, model registry, artifacts	You need ML lifecycle lineage across experiments and models	Artifact access, auth, registry policy, tracking server security
TruLens	Feedback functions and LLM app evaluation	You need groundedness, relevance, and app-level eval checks	Eval cost, feedback calibration, metric misuse
MCP Servers	Reference tool server patterns	You need tool contracts between models/clients and external systems	Least privilege, schema quality, sandboxing, audit logs
Open WebUI	Self-hosted AI workspace and provider gateway	You need a UI, model routing, RAG, tools, admin controls	Admin boundary, tool execution, provider secrets, CORS/auth

Decision Guide

flowchart TB Start[Architecture question] --> AppQ{Need app orchestration?} AppQ -->|Yes| AgentChoice{Main control model} AgentChoice -->|Agent SDK| OA AgentChoice -->|Workflow graph| LC AgentChoice -->|Multi-agent team| AG AgentChoice -->|RAG/query engine| LI Start --> RuntimeQ{Need model runtime?} RuntimeQ -->|Compatibility| HF RuntimeQ -->|Throughput| VLLM RuntimeQ -->|Local/edge| LCPP Start --> DataQ{Need knowledge retrieval?} DataQ -->|Operational vector DB| QD DataQ -->|Developer-friendly RAG store| CH Start --> EvalQ{Need evidence loop?} EvalQ -->|LLM traces| LF EvalQ -->|Observability/evals| PX EvalQ -->|Experiment lineage| MF EvalQ -->|Feedback metrics| TL

Cross-Cutting Production Risks

Risk	Appears In	Architecture Response
Tool side effects	Agents, AutoGen, MCP servers, Open WebUI	Use explicit schemas, approvals, audit logs, sandboxing, and least privilege.
Model artifact trust	Transformers, PEFT, llama.cpp, vLLM	Pin artifacts, prefer safe formats, review remote code, track provenance.
Retrieval drift	LlamaIndex, LangChain, Qdrant, Chroma	Version embeddings, chunks, metadata, and query configs; evaluate retrieval separately.
Trace and PII exposure	Langfuse, Phoenix, TruLens, MLflow	Redact inputs, define retention, isolate tenants, encrypt secrets.
Serving overload	vLLM, llama.cpp, Open WebUI	Add admission control, capacity metrics, scaling policy, fallback routing.
Training irreproducibility	PEFT, DeepSpeed, MLflow	Track dataset, seed, config, checkpoint, adapter, tokenizer, and evaluation run.

How To Use The Atlas In Design Reviews

Start from the product workflow, not from a favorite library.
Assign each requirement to a layer.
Use the matrix to identify candidate repositories.
Check the deep-dive docs for source tree, extension points, and failure modes.
Document why each rejected alternative was rejected.
Define the evidence required to revisit the decision later.