Prasad Kavuri

AI Demo Index

All Production Demos

14 production demos on a shared governance foundation, each tied to an enterprise platform concern: quality, reliability, cost, retrieval, orchestration, or oversight.

AI-Powered Tools

14 production demos — all running on shared governance infrastructure: guardrails, observability, evaluation, and drift monitoring at the platform layer.

New to the platform? → Platform Capabilities for a leadership-level map, then review the AI Evaluation Showcase to see the full governance pipeline, or browse the canonical demos index.

How AI Quality Is Measured

Offline LLM-as-Judge eval cases with semantic fidelity scoring.

Online drift snapshots with hallucination and anomaly indicators.

Regression-aware quality gates designed for release readiness.

Local-First AI Demos

RAG, Vector Search, Multimodal, and Quantization run in-browser with client-side inference paths.

This reduces server-side data exposure for demo workloads and showcases privacy-aware execution patterns.

Trade-off is explicit: local execution improves privacy/cost posture, while server models handle heavier reasoning workloads.

Signature Quality SystemFlagship Demo

AI Evaluation Showcase

Live

Closed-loop LLM evaluation pipeline — semantic fidelity, hallucination detection, guardrails, and CI gating in action. Production-derived eval thresholds — calibrated from real Krutrim deployment patterns. Demonstrates the quality loop recruiters and CTOs look for: offline eval coverage, online drift monitoring, hallucination indicators, and CI-ready regression gating.

LLM-as-JudgeSemantic FidelityGuardrailsCI GatingDrift MonitoringQuality Gates
Core AI Infrastructure

Foundation systems for scalable AI platforms

Live
RAG Pipeline

Improves grounded enterprise knowledge retrieval and reduces unsupported AI answers in operational workflows

Real retrieval-augmented generation with Transformers.js embeddings and ChromaDB — runs entirely in your browser.

Shows how enterprise knowledge can be retrieved with source traceability, relevance controls, and citation — not hallucination.

Transformers.jsChromaDBnomic-embed-text
observability
Open demo
Live
LLM Router

Balances quality, latency, and spend across model tiers for production AI request routing

Real multi-model routing across Llama 3.1 8B, 70B, and Mixtral — see live latency, cost, and quality trade-offs.

Routes each AI request to the optimal model — from fast WASM models for simple tasks to Qwen3.6-27B for agentic reasoning — demonstrating LLM FinOps and decision intelligence at the platform level.

GroqMulti-modelLive latency
guardrailseval enginedrift monitor
Open demo
DesktopLive
Vector Search

Enables semantic discovery and natural-language retrieval across enterprise content systems

Semantic search with real sentence-BERT embeddings and UMAP visualisation of the embedding space.

High-throughput semantic search over large corpora with real-time filtering — the retrieval layer for any serious AI product.

all-MiniLM-L6-v2UMAPCosine similarity
Open demo
Live
AI Evaluation Showcase

Improves release confidence through measurable quality gates and regression visibility before deployment

Closed-loop LLM evaluation pipeline — semantic fidelity, hallucination detection, guardrails, and CI gating in action. Production-derived eval thresholds — calibrated from real Krutrim deployment patterns.

Catches quality regressions before they reach production — the governance layer that separates AI experiments from AI platforms.

LLM-as-JudgeSemantic FidelityGuardrailsCI Gating
Open demo
Live
Native Browser AI Skill

0ms Latency and 100% Privacy (Edge-inference) for accessibility auditing workflows

A reusable Chrome AI Skill that audits webpage accessibility using on-device Gemini Nano.

On-device AI inference with zero server dependency — the architecture pattern for compliance-sensitive enterprise tooling.

Chrome Prompt APIGemini NanoWASM
Open demo
Agentic Systems

Autonomous agents and tool-use orchestration

Live
Multi-Agent System

Coordinates specialized agent workflows with approvals and auditability for high-impact enterprise decisions

CrewAI-powered agents with real LLM calls via Groq — Analyzer, Researcher, and Strategist collaborating in real time.

Demonstrates governed agentic workflows with human-in-the-loop approval checkpoints, audit trails, and role-based orchestration — safe for enterprise deployment.

CrewAIGroqLlama 3.3Handoff ArchitectureAudit TrailAgent Orchestration
planninghitlguardrailsobservabilityeval enginedrift monitor
Open demo
Live
MCP Tool Demo

Improves reliability by standardizing tool access across agent workflows

Model Context Protocol in action — watch an LLM discover and call tools to answer questions about Prasad's background.

Shows how standardized tool protocols reduce integration overhead and make agent capabilities composable across enterprise systems.

MCPTool UseGroq API
observabilityguardrails
Open demo
Live
Enterprise Control Plane

Operationalizes enterprise AI oversight with RBAC, spend controls, and traceable policy enforcement

Org-wide AI governance dashboard — RBAC, group spend limits with token-cost tracking, and structured observability feed.

Operational guardrails for enterprise AI: RBAC, spend analytics, token budgets, and structured observability in a single control surface.

EnterpriseRBACStructured ObservabilityToken Analytics
Open demo
Live
Edge Agent + Cloud Agent Collaboration

Enforces data residency at the device boundary while delivering cloud-quality reasoning — zero PII exposure to external APIs

Three-tier privacy-first AI pipeline: BERT NER redacts PII in the browser via Transformers.js ONNX, a HITL gate governs the handoff, and Groq produces an executive summary from the sanitized payload only.

Demonstrates the governance-first agentic handoff pattern enterprises need for regulated AI workflows: edge extraction, explicit HITL approval, and auditable cloud orchestration.

edge-aibrowser-agentlocal-inferenceprivacy-first-aitool-gatewayagentic-aigovernancesovereign-ai
hitlguardrailsobservability
Open demo
DesktopLive
Real-Time Spatial AI + World Modeling Engine

Accelerates logistics and spatial planning with policy-aware world artifacts that are explainable, reviewable, and simulation-ready

Perception → reconstruction → agent reasoning. Precomputed 3D mesh playback with drift correction visualization and LLM spatial query layer. Controllable parametric spatial design — refine generated scenes with natural-language instructions. Changes are validated, diffed, and auditable.

Brings LLM reasoning into spatial and operational planning — policy-aware world models that are auditable, diffable, and simulation-ready.

World GenerationSpatial AIThree.jsGLB ExportGovernanceSimulation-ReadyDesktop-FriendlyWorld ModelPerceptionParametric RefinementInstruction-Led EditingScene Diff
Open demo
AI Applications

Production AI experiences across modalities

Live
AI Portfolio Assistant

Cuts expert lookup time by making organizational knowledge instantly accessible

Streaming full-context assistant over my experience with optional retrieval-enhanced grounding and cited context cues.

Demonstrates conversational AI grounded in structured knowledge — RAG + LLM working together on a real corpus.

Vercel AI SDKStreamingRetrieval Grounding
guardrailsobservabilityeval engine
Open demo
Live
AI Hiring Intelligence

Reduces recruiting cycle time through faster candidate-role alignment

Paste a job description — get multi-dimension fit scoring, HITL-gated tailoring, and an ATS-optimized resume with drift detection.

AI-powered resume tailoring at scale — shows LLM orchestration applied to a high-frequency, measurable business workflow.

JD parsingSkill matchingHITLEvaluationMulti-Agent
planning
Open demo
DesktopLive
Multimodal Assistant

Lowers processing costs by running vision workflows closer to users

Florence-2 image captioning and OCR running in-browser via Transformers.js — no server, no API key.

Edge-deployed vision AI with zero server cost — the architecture pattern for privacy-sensitive document and image workflows.

Florence-2WebGPUIn-browser
Open demo
DesktopLive
Model Quantization

Reduces infrastructure overhead through smaller, faster production models

Live ONNX benchmark comparing INT8 vs FP32 inference — real file sizes, real latency, real quality diff.

Demonstrates 4-bit MoE quantization delivering 70%+ memory reduction with minimal quality loss — the cost lever most teams overlook.

ONNXINT8 vs FP32Transformers.js
Open demo