Edge Agent + Cloud Agent Collaboration

Live

Privacy-first browser-side extraction → governed handoff to cloud orchestration

Unlike the Browser Native AI Skill demo (Chrome Prompt API), this pipeline uses cross-browser Transformers.js ONNX inference — no browser dependency, no server, no API key for the edge tier.

Transformers.jsONNXHITL GateGroqPrivacy-First

Three-Tier Architecture

Browser / Edge Agent

BERT NER · Transformers.js WASM

Policy Gate — HITL

User inspects redacted payload

Tool Gateway → Cloud Agent

Groq · Llama 3.3 · Summary

Tier 1

Edge Agent — Local Inference

BERT NER runs in your browser via Transformers.js ONNX. No server call, no API key. First-time load may take 30–60 seconds while the model downloads to your device.

Document Input (editable)

Why This Architecture

Privacy by Design

PII never leaves the browser. Cloud receives only business context. Data residency is enforced architecturally, not by policy alone.

Cost Efficiency

Edge handles extraction at zero inference cost. Cloud handles reasoning, consuming only optimized token spend on sanitized, high-signal payloads.

Enterprise Governance

HITL approval gate is enforced at both UI and API layer. Every handoff is auditable by design — the pattern enterprises need for regulated AI workflows.

Tier Comparison

Dimension	Edge Tier	Cloud Tier
Inference cost	Zero	Optimized via LLM Router
Data exposure	None (local only)	Sanitized payload only
Latency	~100–500ms WASM	~1–3s cloud round-trip
Capability	Extraction, NER	Reasoning, summarization

This demo uses real browser-side ONNX inference via Transformers.js. It can be connected to Gemma-family models, Phi-3, or any ONNX-compatible SLM depending on deployment constraints and model licensing.