AI Solution Architecture

Deep Dives

View source

Open WebUI - Architecture Notes

Executive summary

github-repos/06-tooling-mcp-ai-platform/open-webui is a full-stack, self-hosted AI platform. It combines a SvelteKit frontend, a FastAPI backend, database-backed configuration, model-provider gateways, RAG pipelines, tool execution, MCP/OpenAPI tool server integrations, file storage, optional Redis-based scale-out, and optional OpenTelemetry instrumentation.

The repository is much larger than a chat UI. It acts as an AI workbench and control plane: users can chat with local or remote models, upload and index documents, manage model access, run tools and functions, connect external tool servers, administer users/groups, configure retrieval, expose pipelines, and deploy through Docker or Python packaging. The codebase is organized around a clear separation: frontend routes and stores in src/, backend routers/models/utilities in backend/open_webui/, and deployment/runtime configuration through .env.example, Dockerfile, docker-compose.yaml, and pyproject.toml.

Problem solved

Open WebUI solves the gap between raw model APIs and a governed, self-hosted AI workspace. Raw APIs expose completion endpoints, but teams also need:

The repo implements these as one deployable application rather than scattered scripts.

AI stack role

Open WebUI sits at the platform layer. It is not just a model client; it is a user-facing AI application gateway, retrieval orchestrator, tool broker, and administration surface.

flowchart LR Browser["Browser / desktop web UI<br/>SvelteKit app"] --> Backend["FastAPI backend<br/>backend/open_webui/main.py"] Browser <--> Socket["Socket.IO<br/>/ws/socket.io"] Backend --> DB["SQL database<br/>SQLite or PostgreSQL"] Backend --> Redis["Redis / Valkey<br/>sessions, websocket scale, cache"] Backend --> Storage["File storage<br/>local, S3, GCS, Azure Blob"] Backend --> Vector["Vector DB<br/>Chroma, Qdrant, Milvus, PGVector, Pinecone, more"] Backend --> Ollama["Ollama backends"] Backend --> OpenAI["OpenAI-compatible APIs<br/>including Azure-style config"] Backend --> Tools["Tools, functions, skills<br/>local DB + external servers"] Tools --> MCP["MCP Streamable HTTP servers"] Tools --> OpenAPI["OpenAPI tool servers"] Backend --> Telemetry["OpenTelemetry collector<br/>optional"]

Source tree map

PathRole
README.mdProduct overview, supported features, install paths, Docker/pip examples, offline mode notes, and provider support.
package.jsonFrontend package metadata and scripts for SvelteKit/Vite, checks, linting, frontend tests, and Pyodide asset fetch.
pyproject.tomlPython package metadata, FastAPI backend dependencies, optional vector DB dependencies, and open-webui app entry point.
.env.exampleEnvironment variable examples for providers, CORS, telemetry opt-out, and vector DB configuration.
DockerfileMulti-stage frontend/backend image build, optional CUDA/Ollama/slim behavior, model prefetching, healthcheck, and runtime command.
docker-compose.yamlDefault local deployment with ollama, open-webui, data volume, port mapping, and OLLAMA_BASE_URL.
docker-compose.otel.yamlOpenTelemetry/Grafana LGTM deployment example.
TROUBLESHOOTING.mdOperational explanation of the backend reverse proxy to Ollama and Docker networking guidance.
docs/SECURITY.mdSecurity policy, especially around tool/function code execution and admin trust boundaries.
src/SvelteKit frontend routes, components, stores, API clients, workers, and workspace/admin views.
backend/open_webui/main.pyFastAPI app construction, lifespan startup/shutdown, middleware, router mounting, config endpoint, health/readiness endpoints, and static app serving.
backend/open_webui/config.pyPersistent configuration defaults, environment parsing, provider URLs, feature flags, RAG settings, auth settings, and storage/cache paths.
backend/open_webui/env.pyEnvironment loading, logging setup, version/data directories, DB/Redis options, safe mode, audit logging, and telemetry env flags.
backend/open_webui/internal/Database engines/sessions and database-backed runtime configuration state.
backend/open_webui/models/SQLAlchemy/Pydantic data models for users, chats, files, tools, functions, groups, memories, prompts, knowledge, and more.
backend/open_webui/routers/FastAPI routers for auth, users, chats, models, Ollama, OpenAI, retrieval, tools, functions, files, evaluations, pipelines, SCIM, terminals, and admin utilities.
backend/open_webui/retrieval/Document loading, embeddings, reranking, vector DB abstraction, web search, and RAG query helpers.
backend/open_webui/storage/Local and cloud storage providers.
backend/open_webui/utils/Chat pipeline, middleware, model/provider helpers, MCP client, OpenAPI tool conversion, telemetry, and many integration utilities.
backend/open_webui/socket/Socket.IO integration used by the frontend for realtime events and browser-executed tasks.

Core concepts

Self-hosted AI workspace

The application is designed to run under the operator's control. It supports local Ollama deployments, OpenAI-compatible endpoints, offline mode, local data volumes, and external storage/vector providers when needed. This makes it a platform component rather than a simple hosted API wrapper.

Model gateway

Backend routers routers/ollama.py and routers/openai.py aggregate and proxy model APIs. The application can list models, route chat completions, handle streaming responses, call embeddings APIs, and expose compatibility routes such as OpenAI-style chat/completions and response endpoints. utils/chat.py and utils/middleware.py coordinate model selection, direct connections, arena models, functions, files, tools, and response post-processing.

RAG and knowledge

routers/retrieval.py, retrieval/vector/factory.py, and retrieval/vector/main.py implement the retrieval layer. Documents and web results can be chunked, embedded, stored in collections, and queried through vector or hybrid search. The factory supports many vector backends, including Chroma, Qdrant, Milvus, Pinecone, PGVector, OpenSearch, Elasticsearch, Oracle, Weaviate, S3 vector storage, and Valkey.

Tools, functions, skills, and tool servers

Open WebUI has local tools/functions stored in the application database and external tool server support. utils/tools.py handles access checks, built-in tool catalogs, OpenAPI conversion, HTTP operation execution, and MCP tool server discovery. utils/mcp/client.py implements a Streamable HTTP MCP client with initialization, tool listing, tool calling, resource listing, and resource reading.

Persistent configuration

Configuration is not only environment variables. internal/config.py defines database-backed ConfigVar and AppConfig state, with optional Redis synchronization. main.py populates app.state.config with model provider settings, auth settings, feature toggles, RAG options, web search providers, image/audio options, tool server connections, terminal server connections, and more.

Realtime browser integration

The frontend root layout opens a Socket.IO connection and stores it in Svelte stores. It also initializes a Pyodide worker for browser-side Python execution and handles session-targeted events such as Python execution, tool execution, and direct chat completion requests.

Internal architecture

Backend bootstrap

backend/open_webui/main.py is the central runtime entry point. Its lifespan handler:

The same file mounts middleware, routers, static assets, the Socket.IO app, health endpoints, config/version endpoints, OAuth client callback routes, and the SPA fallback.

Data and configuration layer

internal/db.py sets up both synchronous and asynchronous SQLAlchemy engines. The sync engine supports startup tasks, migrations, health checks, and configuration reads. The async engine is used by runtime FastAPI dependencies. SQLite, SQLCipher, and PostgreSQL paths are handled, including SQLite WAL pragmas and PostgreSQL async driver configuration.

internal/config.py stores application configuration as database-backed JSON state. This is important operationally because administrators can update settings at runtime without rebuilding images.

Frontend structure

The frontend lives under src/:

Router and model boundaries

Backend routers mirror user-facing capabilities:

SQL/data models under backend/open_webui/models/ provide the persistent shape behind these routers.

End-to-end flow

sequenceDiagram participant User participant UI as SvelteKit frontend participant API as FastAPI backend participant DB as SQL database participant RAG as Retrieval/vector layer participant Tool as Tool or MCP/OpenAPI server participant Model as Ollama/OpenAI-compatible provider User->>UI: Send chat message with optional files/tools UI->>API: POST chat payload + auth token API->>DB: Load user, model, permissions, chat state API->>RAG: Query selected files/knowledge/web results RAG-->>API: Ranked context chunks API->>Tool: Optional tool discovery or tool call Tool-->>API: Tool result/resource data API->>Model: Completion request with messages, context, tools Model-->>API: Streaming or non-streaming response API->>DB: Persist messages, metadata, usage API-->>UI: Stream tokens/events or final response UI-->>User: Render answer, citations, tool results, artifacts

Runtime and data flow

Authentication and session flow

The frontend gets /api/config, authenticates through the backend, stores user state in Svelte stores, and opens a Socket.IO connection with the token. Backend middleware handles auth tokens, sessions, CORS, security headers, optional audit logging, and optional Redis-backed session storage.

Chat flow

The runtime chat path is coordinated by:

RAG ingestion and query flow

  1. User uploads a file, supplies text, enters a URL, or triggers web search.
  2. routers/files.py and routers/retrieval.py store the file and extract text.
  3. Retrieval helpers chunk documents and call embedding functions.
  4. retrieval/vector/factory.py selects the configured vector backend.
  5. Collections are created or updated with VectorItem records.
  6. Chat-time retrieval queries collections, optionally applies hybrid search/reranking, and injects selected context into the model payload.

Tool and MCP flow

External tool connections are represented in app configuration. utils/tools.py can read OpenAPI specs, convert operations into tool payloads, execute HTTP operations, and cache server data. For MCP servers, utils/mcp/client.py initializes a Streamable HTTP MCP session, lists tools/resources, and calls tools. Local DB-backed tools and functions are still subject to access checks before execution.

File and object storage flow

storage/provider.py abstracts storage. The default provider is local storage, but the same interface supports S3, Google Cloud Storage, and Azure Blob Storage. Cloud providers support explicit credentials or platform identity patterns, depending on the provider.

Deployment and operations topology

flowchart TB subgraph ClientLayer["Client layer"] Browser["Web browser"] Electron["Optional desktop shell"] end subgraph AppContainer["open-webui container or Python process"] Frontend["Built Svelte assets<br/>/app/build"] FastAPI["FastAPI app<br/>port 8080"] SocketIO["Socket.IO app<br/>/ws"] Static["Static/cache files"] end subgraph StateLayer["State layer"] SQL["SQLite volume or PostgreSQL"] Redis["Redis/Valkey<br/>optional"] ObjectStore["Local/S3/GCS/Azure storage"] VectorDB["Vector DB backend"] end subgraph ProviderLayer["Provider and integration layer"] Ollama["Ollama service"] OpenAI["OpenAI-compatible endpoints"] WebSearch["Web search providers"] ToolServers["OpenAPI/MCP tool servers"] OTel["OpenTelemetry collector"] end Browser --> FastAPI Browser <--> SocketIO Electron --> Browser FastAPI --> Frontend FastAPI --> Static FastAPI --> SQL FastAPI --> Redis FastAPI --> ObjectStore FastAPI --> VectorDB FastAPI --> Ollama FastAPI --> OpenAI FastAPI --> WebSearch FastAPI --> ToolServers FastAPI -. optional .-> OTel

Extension points

Backend routes and data models

New backend capabilities usually require:

Provider integrations

Provider work typically extends routers/openai.py, routers/ollama.py, utils/chat.py, or provider-specific utility modules. The existing pattern is to normalize provider differences at the backend boundary so the frontend can stay centered on chat/model abstractions.

Retrieval and vector DBs

Vector DB support is intentionally pluggable. To add another vector backend, implement the VectorDBBase contract from retrieval/vector/main.py and wire it into retrieval/vector/factory.py. Keep collection naming, tenant behavior, search result shape, and delete/reset semantics consistent.

Tool servers and MCP

External tools can enter through:

Frontend application

The frontend extension pattern is route-first. Add a route under src/routes/, a reusable component under src/lib/components/, API helpers under src/lib/apis/, and stores only when state must be shared across screens.

Integrations

Open WebUI integrates with many systems, visible in source and package metadata:

Configuration, deployment, and operations

Packaging and startup

High-value environment settings

Setting areaExamples and source grounding
Model backendsOLLAMA_BASE_URL, OLLAMA_BASE_URLS, OPENAI_API_BASE_URL, OPENAI_API_BASE_URLS, OPENAI_API_KEY from .env.example and backend config.
SecretsWEBUI_SECRET_KEY, provider keys, OAuth secrets, storage credentials.
DataDocker volume /app/backend/data, DATA_DIR, DATABASE_URL, upload/cache directories.
RedisREDIS_URL, websocket/session settings, Redis task listener, config sync.
RAGEmbedding/reranking engines, vector DB selection, web search provider settings, document loader settings.
ToolingTool server connections, MCP initialization timeout, terminal server connections, code interpreter settings.
SecurityCORS, trusted forwarded headers, safe mode, audit logging, OAuth/LDAP/SCIM settings.
TelemetryENABLE_OTEL, ENABLE_OTEL_METRICS, OTLP endpoint settings in docker-compose.otel.yaml and telemetry utilities.

Operational endpoints

Scaling notes

For a single-node installation, SQLite plus local storage is the simplest path. For horizontal or more durable deployments, use PostgreSQL, Redis/Valkey, external object storage, and an external vector DB. Redis is especially important when websocket coordination, sessions, task commands, or multiple application instances are required.

Observability, testing, evaluation, and failure modes

Observability

The backend has optional OpenTelemetry instrumentation in backend/open_webui/utils/telemetry. It instruments FastAPI, SQLAlchemy, Redis, requests, HTTP clients, logging, and system metrics, and sends OTLP data to a collector. docker-compose.otel.yaml provides a Grafana LGTM example. The backend also supports JSON-style logging and optional audit logging through middleware.

Testing and quality gates

Repository metadata shows these quality entry points:

No dependencies were installed and no long builds were run for this documentation pass.

Failure modes

Security and governance risks

Open WebUI is a powerful administrative surface. Its security posture depends heavily on deployment configuration and role assignment.

Key risks and controls:

Lifecycle and dependency diagram

flowchart TD Request["User request"] --> Auth{"Authenticated and authorized?"} Auth -->|No| Reject["Reject or redirect to auth"] Auth -->|Yes| ModelChoice["Resolve model and access policy"] ModelChoice --> Inputs{"Files, knowledge, tools, or direct connection?"} Inputs -->|Files / knowledge| Retrieval["Retrieval pipeline<br/>load, chunk, embed, query"] Inputs -->|Tool call| ToolPath["Tool pipeline<br/>local, OpenAPI, or MCP"] Inputs -->|Direct provider| Direct["Direct connection path"] Inputs -->|Plain chat| Prompt["Build model payload"] Retrieval --> Prompt ToolPath --> Prompt Direct --> Provider["Provider request"] Prompt --> Provider Provider --> Stream{"Streaming?"} Stream -->|Yes| Events["Stream events/tokens to UI"] Stream -->|No| Final["Return final response"] Events --> Persist["Persist messages, metadata, usage"] Final --> Persist Persist --> Audit["Optional audit/telemetry/evaluation"]

Reading guide

Recommended reading order:

  1. README.md, .env.example, docker-compose.yaml, and Dockerfile for product scope and runtime assumptions.
  2. backend/open_webui/main.py to understand application startup, middleware, routers, and health endpoints.
  3. backend/open_webui/config.py, env.py, internal/db.py, and internal/config.py for configuration and persistence.
  4. backend/open_webui/routers/openai.py, routers/ollama.py, utils/chat.py, and utils/middleware.py for chat/model flow.
  5. backend/open_webui/routers/retrieval.py and retrieval/vector/* for RAG.
  6. backend/open_webui/utils/tools.py and utils/mcp/client.py for tool server integration.
  7. src/routes/+layout.svelte, src/routes/(app)/+layout.svelte, src/lib/stores/index.ts, and src/lib/apis/* for frontend behavior.
  8. docs/SECURITY.md before enabling tools/functions for broad user groups.

Learning path

  1. Start with a single-user Docker deployment using local storage and one Ollama or OpenAI-compatible provider.
  2. Add a persistent database and inspect how chats, users, models, and files map to backend models.
  3. Configure one knowledge collection and trace upload-to-vector-to-chat retrieval.
  4. Add one external OpenAPI or MCP tool server and inspect access checks.
  5. Enable Redis and review websocket/session behavior.
  6. Add OpenTelemetry and audit logging before moving toward shared or production-like usage.
  7. Finally, tune RBAC, groups, model access, storage retention, and backup policy.

Glossary

TermMeaning in this repository
Open WebUIThe full-stack self-hosted AI workspace implemented by this repo.
SvelteKitFrontend framework used under src/.
FastAPIBackend framework used under backend/open_webui/.
Ollama routerBackend proxy/router for local or remote Ollama instances.
OpenAI routerBackend proxy/router for OpenAI-compatible model APIs.
RAGRetrieval-augmented generation using files, web content, embeddings, vector search, and reranking.
Vector DBStorage/search backend for embedded chunks.
Tool serverExternal OpenAPI or MCP server that exposes callable tools.
FunctionLocal executable extension managed by Open WebUI.
AppConfigDatabase-backed runtime configuration wrapper.
Redis/ValkeyOptional cache/session/websocket/task coordination layer.
OTELOpenTelemetry instrumentation and export.

Repository-Grounded Deep Dive

Open WebUI is a full-stack AI control plane: SvelteKit frontend, FastAPI backend, model-provider routers, RAG pipeline, tool/function execution, MCP client integration, database-backed configuration, and optional telemetry/cache infrastructure. The key source boundaries are github-repos/06-tooling-mcp-ai-platform/open-webui/src/ for frontend routes and components, backend/open_webui/main.py for backend bootstrap, backend/open_webui/routers/ for API domains, backend/open_webui/retrieval/ for ingestion and retrieval, backend/open_webui/retrieval/vector/ for vector database adapters, backend/open_webui/utils/mcp/client.py for MCP integration, backend/open_webui/utils/telemetry/ for OTEL, and docker-compose*.yaml for runtime topology examples.

flowchart LR Browser["SvelteKit UI src/routes and src/lib"] --> API["FastAPI backend backend/open_webui/main.py"] API --> Auth["auths, users, groups routers"] API --> Chat["chats, models, openai, ollama routers"] API --> RAG["retrieval, files, knowledge routers"] API --> Tools["tools, functions, skills routers"] API --> Config["internal db and AppConfig"] Chat --> Providers["Ollama and OpenAI-compatible providers"] RAG --> Vector["retrieval/vector adapters"] Tools --> MCP["utils/mcp/client.py"] API --> Telemetry["utils/telemetry"]

The main architectural issue is that chat, retrieval, tool execution, and provider routing all meet at the user conversation boundary. A single prompt can traverse auth policy, model access policy, file permissions, vector search, reranking, function execution, MCP calls, and outbound model requests. That means production review must include data governance and action governance, not only model-provider configuration.

sequenceDiagram participant User as Browser user participant UI as SvelteKit frontend participant API as FastAPI backend participant RAG as Retrieval pipeline participant Tool as Function or MCP tool participant Model as Model provider participant DB as Database and config User->>UI: send chat message UI->>API: authenticated chat request API->>DB: load user, model, group, and config policy API->>RAG: optional file or knowledge retrieval RAG-->>API: chunks, citations, rerank scores API->>Tool: optional tool or MCP invocation Tool-->>API: structured tool result API->>Model: prompt, context, tools, policy Model-->>API: streamed or full completion API-->>UI: response, citations, events
flowchart TD Risk["Production risk"] --> Auth["auth and RBAC drift"] Risk --> Provider["provider key and routing drift"] Risk --> Retrieval["RAG index inconsistency"] Risk --> Tool["tool or function overreach"] Risk --> Config["database-backed config mutation"] Risk --> Telemetry["missing audit/telemetry"] Auth --> A1["user can access wrong model or file"] Provider --> P1["requests route to unexpected backend"] Retrieval --> R1["chunks embedded with old model remain active"] Tool --> T1["MCP/function executes beyond intended scope"] Config --> C1["runtime setting changes without release trace"] Telemetry --> O1["incident cannot reconstruct prompt path"]

Production Readiness Checklist

Senior Architect Reading Path

Start with backend/open_webui/main.py, backend/open_webui/config.py, and backend/open_webui/internal/db.py to understand process and state. Then read routers by operational domain: auth/users/groups, models/openai/ollama, chats, retrieval/files/knowledge, and tools/functions/skills. Move next to backend/open_webui/retrieval/vector/ and backend/open_webui/utils/mcp/client.py. Finish with frontend paths under src/routes/, src/lib/apis/, src/lib/stores/, and src/lib/components/ to see how backend capabilities become user workflows.