Skip to content

Data, RAG & Retrieval Layer

The Data/RAG layer turns organizational knowledge into controlled context for the model. It is where documents, databases, tickets, policies, code, and product knowledge become retrievable evidence.

RAG is not just a vector database. It is the full pipeline that makes enterprise data usable, permission-aware, fresh, grounded, and measurable.

What this layer owns

ConcernOwnership
Source selectionWhich systems and documents are allowed into the AI context
IngestionHow content enters the index
ParsingHow PDFs, HTML, code, tables, docs, and tickets become text/metadata
ChunkingHow content is split for retrieval
Embedding and indexingHow meaning and keywords become searchable
RetrievalHow the right chunks are selected
RerankingHow candidates are ordered by relevance
GroundingHow answers cite and use retrieved context
FreshnessHow stale content is detected or removed
PermissionsHow user access is enforced at retrieval time

RAG pipeline

mermaid
flowchart LR
    A[Source systems] --> B[Ingestion]
    B --> C[Parsing and chunking]
    C --> D[Embedding]
    D --> E[Vector store / hybrid index]
    E --> F[Retriever]
    F --> G[Reranker]
    G --> H[Prompt context]
    H --> I[LLM answer]
    J[Permissions and freshness policy] -. controls .-> B
    J -. controls .-> E
    J -. controls .-> F

Permission-aware retrieval

Permission-aware retrieval means the system retrieves only what the current user, tenant, role, or service is allowed to see.

Do not rely on the model to ignore forbidden content. The retriever must filter before context reaches the model.

PatternWhen to use
Metadata filterTenant, department, role, document type
ACL syncEnterprise documents with existing access rules
Query-time policy checkSensitive systems or mixed data classes
Separate indexesHard isolation across tenants or regulated domains
Redaction pipelinePII, secrets, or contractual data

RAG quality levers

LeverWhat can go wrongHow to improve
Source qualityold, duplicated, contradictory documentssource curation and freshness rules
Parsing qualitytables/code/images are lostdocument-aware parsing
Chunk size and overlapcontext is too small or too noisyevaluate chunk settings by query type
Embedding modelsemantic matches are weakcompare embeddings on real queries
Vector vs hybrid retrievalkeyword-heavy queries failcombine vector and keyword retrieval
Rerankinggood candidates are buriedrerank top-k candidates
Citation and groundingmodel answers without evidencerequire citations and context checks
Freshnessstale policies answer current questionsversion and expiry metadata
Access controldata leakagefilter before prompt construction
Eval datasetsno measurable qualitycreate golden queries and expected evidence

Where LangChain and LlamaIndex fit

LangChain can orchestrate RAG flows inside an AI application: loaders, retrievers, tools, prompts, and chains. LlamaIndex is often used when the data/indexing layer is the central concern, especially document ingestion, indexing abstractions, retrieval strategies, and knowledge workflows.

These frameworks do not replace governance. Use OpenSpec or Spec Kit to define the RAG feature, AI-DLC for high-risk delivery, and eval/observability tools to prove behavior.

Step-by-step RAG implementation guide

  1. Define the user question class: support, policy, product docs, code search, analytics.
  2. Define allowed sources and data owners.
  3. Define permissions before indexing.
  4. Build a small golden dataset: 30-100 representative questions with expected source evidence.
  5. Ingest a narrow source set first.
  6. Parse and chunk with metadata: source, owner, freshness, tenant, ACL, version.
  7. Compare retrieval strategies: vector, keyword, hybrid, rerank.
  8. Build answer generation with citations.
  9. Add refusal behavior when evidence is missing.
  10. Run evals in CI before changing chunking, prompts, retrievers, or models.

Failure modes

Failure modeSymptomFix
Vector DB treated as the whole RAG systemanswers are inconsistent and hard to debugdesign the full pipeline
No golden datasetevery prompt change feels subjectivecreate query/evidence evals
No permission filterconfidential chunks leak into promptsenforce ACL before retrieval output
Stale indexold policies are citedfreshness metadata and re-index jobs
Too much retrieved contextmodel ignores key factsreranking and context compression
No citation requirementhallucinations look confidentgrounded answer format

References

Built as a static bilingual AI engineering stack guide.