RAG Pipeline Demo

Retrieval-Augmented Generation with local embeddings

Runs in BrowserPrivacy-preserving local retrieval

Click to load the embedding model in your browser. No API key required.

How it works

  • • Prefers WebGPU when available, then falls back to other browser backends
  • • Uses all-MiniLM-L6-v2 (32MB) for semantic embeddings
  • • Cosine similarity ranks documents by semantic relevance
  • • No API keys or server-side processing required

Why this matters

Enterprise teams use RAG to ground responses in approved knowledge sources. This demo shows retrieval quality, fallback behavior, and operational reliability as platform concerns, not just model output quality.