AI Solution Architecture

Deep Dives

View source

Chroma Architecture Notes

Source Basis

This document is based on static inspection of the local repository at github-repos/04-rag-vector-database/chroma. The main files and directories used were README.md, DEVELOP.md, pyproject.toml, Cargo.toml, Dockerfile, docker-compose.yml, chromadb/__init__.py, chromadb/config.py, chromadb/api/client.py, chromadb/api/fastapi.py, chromadb/api/rust.py, chromadb/test, rust/python_bindings, rust/frontend, rust/log, rust/sysdb, rust/segment, rust/index, rust/worker, rust/chroma, clients, go, schemas, examples, deployments, and k8s.

Executive Summary

Chroma is open-source data infrastructure for AI applications. In a RAG system, it acts as the vector and retrieval data layer for documents, metadata, embeddings, filters, and search. The repository contains both a Python package (chromadb) and a large Rust workspace. The Python package provides the developer-facing client API, local clients, HTTP clients, cloud client helpers, configuration, auth adapters, embedding functions, and tests. The Rust workspace implements the modern storage/query core, Python native bindings, HTTP frontend, local and distributed executors, system database, log, segment management, sparse/full-text indexes, workers, and deployment-oriented services.

Architecturally, Chroma supports several operating modes:

The codebase is especially relevant to solution architects because it is not only a vector index wrapper. It includes tenancy, database/collection metadata, embedding ingestion, log-based mutation flow, local and distributed query execution, OpenTelemetry hooks, auth configuration, deployment assets, and multiple SDK surfaces.

Problem Solved

AI applications need a retrieval substrate that can keep vectors, source text, metadata, filters, and search behavior close to application code without forcing every team to build storage and ANN plumbing. Chroma solves that problem by providing:

For RAG, the main value is reducing the distance between application code and reliable retrieval. Developers can start locally with a few lines of Python and later move toward server or distributed modes when scale, governance, or operational boundaries require it.

Role in an AI Stack

Chroma usually fits into the AI stack as follows:

Chroma does not own the whole RAG workflow. Chunking, prompt assembly, model calls, reranking, user authorization, and business governance remain application responsibilities unless explicitly integrated around Chroma.

Source Tree Map

Important repository areas:

chroma/
  README.md                         Product quick start and high-level API examples.
  DEVELOP.md                        Development setup, tests, distributed dev, Tilt, Kubernetes notes.
  pyproject.toml                    Python package metadata, dependencies, CLI entry point, maturin build.
  Cargo.toml                        Rust workspace and shared crate dependencies.
  Dockerfile, docker-compose.yml    Container/server runtime and local compose example.
  chromadb/
    __init__.py                     Client constructors: Ephemeral, Persistent, Rust, HTTP, Cloud.
    config.py                       Settings, component system, environment-driven implementation selection.
    api/
      client.py                     Python Client and AdminClient facade.
      fastapi.py                    HTTP client adapter for `/api/v2`.
      rust.py                       RustBindingsAPI adapter for local native storage.
    utils/embedding_functions/      Built-in embedding provider integrations.
    test/                           Python, API, auth, config, distributed, property, persistence tests.
  rust/
    python_bindings/                PyO3 bindings exposing Rust core to Python.
    frontend/                       Axum HTTP server, API routes, frontend service implementation.
    log/                            SQLite-backed mutation log implementation.
    sysdb/                          SQLite system database for tenants, DBs, collections, segments.
    segment/                        Local segment manager, metadata reader/writer, HNSW integration.
    index/                          Sparse and full-text index implementations.
    worker/                         Worker/query/compaction-related crate.
    chroma/                         Rust client crate.
    cli/                            Rust CLI entrypoint.
  k8s/                              Helm chart and test infrastructure for distributed Chroma.
  deployments/                      Cloud deployment examples for AWS, Azure, GCP, and others.
  clients/, go/, schemas/, examples/ SDKs, schemas, sample apps, integration examples.

Core Concepts

Client

The Python package exposes multiple client constructors in chromadb/__init__.py: EphemeralClient, PersistentClient, RustClient, HttpClient, AsyncHttpClient, CloudClient, Client, and AdminClient. These constructors primarily select settings and API implementation.

Tenant and Database

Chroma models tenants and databases as first-class resources. The Python Client stores tenant and database values, validates them through AdminClient, and scopes collection operations accordingly.

Collection

A collection is the user-facing container for records. It owns names, metadata, optional schema/configuration, dimension, and associated segments. Application code uses collections to add, upsert, update, delete, count, get, query, and search records.

Record

A record can include an ID, document, metadata, embedding, and other fields depending on the operation. RAG systems commonly use one record per chunk or memory item.

Embedding Function

Embedding functions transform text, images, or other inputs into embeddings. The repository includes provider integrations under chromadb/utils/embedding_functions, with support for many external and local providers.

System and Component

chromadb/config.py defines a component system. System maps abstract component types to configured implementations, starts dependencies in order, stops them in reverse order, and stores shared settings. This is how Python-side configuration chooses local Rust bindings, HTTP clients, auth components, telemetry, and other implementation pieces.

Rust Bindings API

chromadb/api/rust.py adapts chromadb_rust_bindings.Bindings into the Python ServerAPI shape. It is the local in-process path for persistent or ephemeral Chroma backed by Rust, SQLite, and local HNSW/segment state.

System Database

rust/sysdb stores metadata about tenants, databases, collections, segments, dimensions, schemas, and collection configuration. The local implementation is SQLite-backed.

Log

rust/log records mutation events. The inspected sqlite_log.rs stores records in SQLite embeddings_queue, supports pulling records by topic/offset, and coordinates backfill/purge messages for compaction.

Segment

rust/segment manages local indexes and metadata storage. LocalSegmentManager caches HNSW indexes and closes file descriptors on eviction. sqlite_metadata.rs applies logs to metadata tables and supports filters, full-text metadata, and schema evolution.

Executor

The frontend delegates reads and searches to an executor. LocalExecutor serves local SQLite/HNSW use cases. DistributedExecutor routes query plans to query services through memberlist and gRPC clients.

Component and System Diagram

flowchart TB App[AI application / notebook / service] PyClient[chromadb Python Client<br/>chromadb/api/client.py] HttpClient[HTTP/Async HTTP Client<br/>chromadb/api/fastapi.py] RustApi[RustBindingsAPI<br/>chromadb/api/rust.py] Bindings[PyO3 Bindings<br/>rust/python_bindings] Server[Axum Frontend Server<br/>rust/frontend] Frontend[ServiceBasedFrontend] SysDb[SysDb<br/>rust/sysdb] Log[Mutation Log<br/>rust/log] Executor[Executor] LocalExec[LocalExecutor<br/>SQLite + HNSW] DistExec[DistributedExecutor<br/>gRPC query services] Segment[LocalSegmentManager<br/>rust/segment] Index[Index crates<br/>HNSW, sparse, full-text] Storage[(SQLite, local files, object/block storage)] App --> PyClient PyClient --> HttpClient PyClient --> RustApi HttpClient --> Server RustApi --> Bindings Bindings --> Frontend Server --> Frontend Frontend --> SysDb Frontend --> Log Frontend --> Executor Executor --> LocalExec Executor --> DistExec LocalExec --> Segment DistExec --> Query[Query workers/services] Segment --> Index SysDb --> Storage Log --> Storage Segment --> Storage

Internal Architecture

Python API Layer

The Python layer gives Chroma its simple developer experience. chromadb/__init__.py builds settings for each client type:

chromadb/api/client.py wraps a ServerAPI and exposes collection and admin operations. It also validates tenant and database existence during client initialization.

Configuration and Component System

chromadb/config.py is the central Python configuration model. Settings includes API implementation selection, server host/port/SSL, persistence path, reset behavior, auth providers, telemetry controls, migration behavior, distributed service implementation choices, gRPC timeouts, and legacy-configuration validation.

The System class is a lightweight dependency container. Components declare dependencies, are started in topological order, and are stopped in reverse dependency order. This matters because the same Python API can point at local Rust bindings, HTTP services, or configured auth/telemetry implementations.

Local Rust Path

The local path runs through RustBindingsAPI and rust/python_bindings. The binding constructor builds a Tokio runtime, registry, SQLite system database, SQLite log, local segment manager, local executor, compaction manager, and a Frontend. Persistent clients use persist_directory/chroma.sqlite3 plus local index files. The code also sizes HNSW cache capacity against file-handle limits because each HNSW index opens multiple files.

This path is the default modern local storage route and is critical for notebooks, unit tests, desktop tools, and single-process applications.

HTTP Frontend

rust/frontend/src/server.rs implements the Axum server. It registers /api/v2 routes for health, heartbeat, preflight, version, identity, reset, tenants, databases, collections, records, query, search, and attached functions. It also exposes /docs and /openapi.json, supports CORS, body limits, JSON error conversion, tracing layers, auth, quota, metrics, and graceful shutdown on signals.

rust/frontend/src/lib.rs loads configuration, applies persist paths to SQLite/local segment settings, creates the frontend, builds scorecard/circuit breaker rules, and starts the server.

Service-Based Frontend

rust/frontend/src/impls/service_based_frontend.rs is the main service implementation. It coordinates collection metadata through SysDb, appends mutations to Log, invalidates collection caches, emits metering events, applies retries, and delegates read/query/search execution to an executor.

Local Executor

rust/frontend/src/executor/local.rs implements local count, get, and KNN query flow. It can backfill a collection by sending compaction messages, read metadata through SQLite, load embeddings through HNSW readers, and hydrate documents/metadata. Local search is explicitly not implemented in the inspected code, which is an important capability distinction.

Distributed Executor

rust/frontend/src/executor/distributed.rs routes count, get, KNN, and search plans to distributed query services. It uses memberlist-based assignment, gRPC clients, retry policies, weighted or uniform selection, replication factor, and cache invalidation/replanning when query services return certain errors.

Storage and Index Layer

The storage/index layer includes:

These crates show Chroma moving toward a broad retrieval system: dense vector search, metadata filtering, sparse lexical retrieval, and full-text candidate generation.

End-to-End Runtime Flow

sequenceDiagram participant App as Application participant Client as chromadb Client participant API as RustBindingsAPI or HTTP API participant FE as ServiceBasedFrontend participant Sys as SysDb participant Log as Mutation Log participant Exec as Executor participant Seg as Segment/Index App->>Client: create_collection / get_collection Client->>API: scoped by tenant and database API->>FE: create or fetch collection FE->>Sys: persist collection metadata and segments App->>Client: add or upsert IDs, docs, metadata, embeddings Client->>API: validate and submit records API->>FE: add/update/upsert FE->>Log: append mutation records FE->>Sys: update collection metadata when needed Log-->>Seg: compaction/backfill applies log to local indexes App->>Client: query by text or embedding with filters Client->>API: query request API->>FE: plan query FE->>Exec: execute local or distributed query Exec->>Seg: metadata filter + vector search + hydration Seg-->>App: IDs, distances, documents, metadata, embeddings as requested

Add and Upsert Flow

Application code calls collection methods through the Python client. Depending on settings, requests go to local Rust bindings or to the HTTP API. The frontend validates collection state, appends records to the log, updates system metadata if required, and eventually materializes records into local or distributed segment/index state through compaction and execution paths.

Query Flow

For local query, the executor reads metadata filters from SQLite, optionally reduces candidates, uses HNSW readers for vector search, then hydrates requested fields. For distributed query, the frontend creates a plan and the distributed executor selects query services by collection and shard assignment.

Client Mode Decision Flow

flowchart TD Need[Application needs Chroma] LocalOnly{Same process local store?} Persist{Need data after restart?} ServiceBoundary{Need separate service boundary?} Hosted{Use Chroma Cloud?} Ephemeral[EphemeralClient] Persistent[PersistentClient / RustClient<br/>persist_directory + SQLite] Http[HttpClient or AsyncHttpClient<br/>server at /api/v2] Cloud[CloudClient<br/>token auth + hosted endpoint] Need --> Hosted Hosted -- yes --> Cloud Hosted -- no --> LocalOnly LocalOnly -- yes --> Persist Persist -- yes --> Persistent Persist -- no --> Ephemeral LocalOnly -- no --> ServiceBoundary ServiceBoundary -- yes --> Http

Deployment and Operations Topology

flowchart LR subgraph ClientTier[Client tier] Py[Python SDK] JS[JS/TS SDK] Go[Go client] Rust[Rust client crate] Apps[RAG services and agents] end subgraph SingleNode[Single-node/server mode] Server[Chroma frontend API<br/>port 8000] LocalSql[(SQLite sysdb/log)] LocalIndex[(Local HNSW and metadata files)] end subgraph Distributed[Kubernetes distributed mode] FE[Rust frontend service] SysSvc[SysDB service/migration] LogSvc[Log service] Query[Query service / workers] Compact[Compaction service] GC[Garbage collector] Member[Memberlist] Queue[Work queue] end subgraph Infra[Backing infrastructure] Pg[(Postgres or SQLite depending config)] Obj[(Object/block storage, MinIO/S3-like)] Obs[OTEL, Prometheus, Grafana, Jaeger] end Apps --> Py Apps --> JS Apps --> Go Apps --> Rust Py --> Server JS --> Server Go --> Server Rust --> Server Server --> LocalSql Server --> LocalIndex Py --> FE FE --> SysSvc FE --> LogSvc FE --> Query Query --> Obj Compact --> Obj SysSvc --> Pg LogSvc --> Pg FE --> Obs Query --> Obs Compact --> Obs Member <--> Query

Local Python

README.md shows a minimal Python flow: install chromadb, create a client, create a collection, add documents/metadata/IDs, and query. PersistentClient and RustClient are the core local persistent modes.

Server Mode

The README shows chroma run --path /chroma_db_path. The local docker-compose.yml builds the Rust Docker target, persists /data, exposes port 8000, and health-checks /api/v2/heartbeat.

Distributed and Kubernetes

DEVELOP.md and k8s/ show a distributed development story using Docker, Kubernetes, Tilt, Helm, and support services. The Helm chart under k8s/distributed-chroma contains templates for frontend, query, compaction, sysdb, log, work queue, garbage collector, memberlist resources, and migrations. k8s/test includes Prometheus, Grafana, Jaeger, MinIO, Postgres, Spanner, and OTEL collector assets for integration testing.

Cloud and Managed Endpoints

CloudClient configures hosted API access, default host api.trychroma.com, SSL, tenant/database defaults, and token authentication. This code path is useful when the vector data plane is managed outside the application runtime.

Module Dependency and Lifecycle Diagram

stateDiagram-v2 [*] --> ConfigureClient ConfigureClient --> SelectApiImpl SelectApiImpl --> LocalBindings: RustBindingsAPI SelectApiImpl --> HttpApi: FastAPI/AsyncFastAPI SelectApiImpl --> CloudApi: CloudClient LocalBindings --> StartSystem HttpApi --> ConnectServer CloudApi --> ConnectHosted StartSystem --> OpenSqlite OpenSqlite --> BuildFrontend BuildFrontend --> Ready ConnectServer --> Ready ConnectHosted --> Ready Ready --> CollectionCrud Ready --> AddOrUpsert Ready --> QueryOrSearch AddOrUpsert --> AppendLog AppendLog --> BackfillOrCompact BackfillOrCompact --> QueryOrSearch QueryOrSearch --> Ready Ready --> Shutdown Shutdown --> [*]
flowchart TD Frontend[ServiceBasedFrontend] Auth[Auth and quota layers] Sys[SysDb] Log[Log] Cache[Collection cache] Exec{Executor implementation} Local[LocalExecutor] Dist[DistributedExecutor] Metadata[SQLite metadata reader/writer] Hnsw[Local HNSW index cache] QuerySvc[gRPC query services] Member[Memberlist assignment] Auth --> Frontend Frontend --> Sys Frontend --> Log Frontend --> Cache Frontend --> Exec Exec --> Local Exec --> Dist Local --> Metadata Local --> Hnsw Dist --> Member Dist --> QuerySvc

Extension Points

Chroma extension points are primarily API, configuration, embedding, and service-implementation choices:

Integrations

The repository shows or supports several integration categories:

Configuration, Deployment, and Operations

Python Package and Build

pyproject.toml defines the chromadb package, Python >=3.9, runtime dependencies such as pydantic, uvicorn, opentelemetry, grpcio, bcrypt, kubernetes, httpx, and the chroma CLI entry point. The build backend is maturin, pointing at rust/python_bindings/Cargo.toml, which explains why local Python operation depends heavily on Rust native code.

Rust Workspace

Cargo.toml defines a broad workspace containing API types, frontend, worker, sysdb, log, storage, segment, index, Python bindings, JavaScript bindings, CLI, and supporting crates. This is where most newer service and retrieval internals live.

Runtime Settings

Operational settings include:

Deployment Practices

Observability, Testing, Evaluation, and Failure Modes

Observability

Chroma includes OpenTelemetry dependencies and settings, tracing in Rust services, HTTP health/heartbeat/preflight routes, Docker health checks, and Kubernetes test infrastructure for Prometheus, Grafana, Jaeger, and OTEL collector. The Rust frontend also includes metrics counters, tracing middleware, scorecard/circuit-breaker concepts, and graceful shutdown.

Useful production signals:

Testing

The repository has several test layers:

For RAG evaluation, add workload-specific tests outside the repository defaults: retrieval recall@k, MRR/NDCG, metadata filter correctness, query latency percentiles, stale-read behavior after ingestion, and answer-quality impact after reranking.

Failure Modes

Important failure modes:

Security and Governance Risks

Reading Guide

Recommended reading order:

  1. README.md for the user-facing API and product positioning.
  2. pyproject.toml and chromadb/__init__.py for packaging and client modes.
  3. chromadb/config.py for settings, component wiring, auth, telemetry, and distributed knobs.
  4. chromadb/api/client.py, chromadb/api/fastapi.py, and chromadb/api/rust.py for Python API boundaries.
  5. rust/python_bindings/src/bindings.rs for local persistent runtime composition.
  6. rust/frontend/src/server.rs and rust/frontend/src/impls/service_based_frontend.rs for HTTP/server behavior.
  7. rust/sysdb, rust/log, and rust/segment for metadata, mutation, and local index storage.
  8. rust/frontend/src/executor/local.rs and rust/frontend/src/executor/distributed.rs for query execution behavior.
  9. rust/index/src/sparse/README.md and rust/index/src/fulltext/README.md for hybrid retrieval internals.
  10. k8s, deployments, and DEVELOP.md for distributed development and operations.

Learning Path

  1. Start with the README Python example and create one local collection.
  2. Switch from ephemeral to persistent mode and inspect the generated SQLite/index files.
  3. Use HttpClient against a local server to understand client/server boundaries.
  4. Add metadata filters and document filters; test expected and unexpected metadata types.
  5. Read RustBindingsAPI and python_bindings to understand local runtime composition.
  6. Follow a record from add/upsert through frontend, log, compaction/backfill, segment, and query.
  7. Study LocalExecutor and DistributedExecutor side by side to understand scale-out behavior.
  8. Explore sparse and full-text index READMEs to understand hybrid retrieval direction.
  9. Add a RAG evaluation harness with fixed queries, labeled relevant chunks, and latency budgets.
  10. Review Kubernetes templates only after the local and server-mode internals are clear.

Glossary

Deep-Dive Addendum: Local, Service, and Distributed Boundaries

Chroma's repository is especially useful for architects because it exposes more than one operating shape. The Python surface under github-repos/04-rag-vector-database/chroma/chromadb/api/ supports local clients, HTTP clients, async clients, and collection models. The local index path is visible under chromadb/segment/impl/vector/ and chromadb/segment/impl/metadata/sqlite.py. The Rust side separates rust/frontend/, rust/worker/, rust/log/, rust/sysdb/, rust/blockstore/, rust/storage/, and rust/types/. That split is the clearest source-grounded way to understand Chroma as both an embedded developer database and a service architecture.

flowchart LR User["Application code"] --> PyApi["Python API chromadb/api"] PyApi --> LocalClient["PersistentClient or local segment API"] PyApi --> HttpClient["HTTP/async FastAPI client"] LocalClient --> LocalHnsw["local_hnsw and local_persistent_hnsw"] LocalClient --> SqliteMeta["metadata SQLite segment"] HttpClient --> Frontend["rust/frontend service"] Frontend --> SysDb["rust/sysdb metadata"] Frontend --> Log["rust/log mutation stream"] Frontend --> Executor["local or distributed executor"] Executor --> Worker["rust/worker query and compaction services"] Worker --> Storage["rust/storage and blockstore"]

The key design question is where responsibility for durability, indexing, and query execution sits. In local mode, the application process is close to persistence and HNSW state, which is good for simple development and small deployments but makes resource governance part of the application host. In service mode, the frontend, log, sysdb, worker, and storage layers can be reasoned about separately. The repository makes this visible through rust/frontend/src/impls/in_memory_frontend.rs, rust/frontend/src/impls/service_based_frontend.rs, rust/frontend/src/executor/local.rs, rust/frontend/src/executor/distributed.rs, and worker binaries such as rust/worker/src/bin/query_service.rs and rust/worker/src/bin/compaction_service.rs.

Ingestion, Query, and Compaction Lifecycle

sequenceDiagram participant App as App or SDK participant API as Python or HTTP API participant FE as Frontend participant Log as Mutation log participant WQ as Work queue participant Worker as Query or compaction worker participant Store as Segment and block storage App->>API: add/upsert/query collection records API->>FE: normalized collection operation FE->>Log: append mutation when write path is used FE->>WQ: enqueue compaction or execution work WQ->>Worker: dispatch work item Worker->>Store: read or materialize segment state Store-->>Worker: vectors, metadata, or compacted block data Worker-->>FE: result set or compaction status FE-->>API: response API-->>App: records, distances, documents, metadata

For RAG systems, the important production issue is not only whether query() returns similar documents. It is whether the ingestion path, metadata migrations, compaction cadence, and embedding-function configuration are all stable enough to give repeatable retrieval. Files like chromadb/utils/embedding_functions/, chromadb/api/collection_configuration.py, chromadb/migrations/, rust/sqlite/migrations/, and rust/spanner-migrations/ show that model configuration and metadata storage are part of the architecture, not convenience details.

Operating-Mode Decision Tree

flowchart TD Start["Choose Chroma operating mode"] --> Dev{"Single developer or notebook?"} Dev -->|yes| Local["PersistentClient with local HNSW and SQLite"] Dev -->|no| Multi{"Multiple app instances or users?"} Multi -->|yes| Server["HTTP/service deployment"] Multi -->|no| Local Server --> Scale{"Need independent compaction/query workers?"} Scale -->|yes| Distributed["Frontend + log + sysdb + worker + storage"] Scale -->|no| SingleNode["Single-node server configuration"] Distributed --> Govern["Govern embedding functions, schema, auth, quotas"] SingleNode --> Govern Local --> Govern

Production Readiness Checklist

Senior Architect Reading Guide

Start with chromadb/api/client.py, chromadb/api/fastapi.py, and chromadb/api/rust.py to see the API shapes. Then read the local storage pieces under chromadb/segment/impl/vector/ and chromadb/segment/impl/metadata/sqlite.py. After that, move to rust/frontend/src/, then rust/worker/src/, then rust/log/src/ and rust/sysdb/src/. This order makes the difference between embedded Chroma and service Chroma visible before diving into lower-level block storage or deployment manifests.

Additional Glossary