AI Skills Catalog

Reusable capability modules powering every demo on this site. This is the orchestration substrate — the layer that makes models interchangeable and systems governable.

Guardrails

Safety

Content filtering and policy enforcement layer. Intercepts model output before it reaches the user, enforcing topic boundaries, competitor mention rules, and toxicity thresholds.

Input: Raw LLM output string + active policy config
Output: Filtered string + violation log entry (if triggered)
src/lib/guardrails.ts

Prevents brand/compliance risk in customer-facing deployments without re-prompting the model on every call.

Observability & Tracing

Observability

End-to-end Trace-ID propagation across all agent hops. Every request gets a correlation ID that flows through model calls, tool invocations, and API responses — enabling full lineage replay.

Input: Incoming request + optional parent trace-id header
Output: Enriched request context with trace-id, span metadata, and timing
src/lib/observability.ts

Reduces mean time-to-diagnose from hours to minutes when model behaviour regresses in production.

Evaluation Engine

Evaluation

Automated scoring of model outputs against ground-truth rubrics. Runs latency, relevance, hallucination, and policy-compliance checks as a CI gate and live health signal.

Input: Model output + evaluation rubric (JSON) + optional reference answer
Output: Score object with per-dimension breakdown and pass/fail flag
src/lib/eval-engine.ts

Closes the feedback loop between deploy and degrade — catches quality regressions before users report them.

Drift Monitor

Evaluation

Tracks statistical shifts in model output distributions over time. Alerts when response length, tone, or topic distribution deviates beyond a configured threshold from the baseline window.

Input: Rolling window of model responses + baseline statistics
Output: Drift score (0–1), alert flag, and delta report vs. baseline
src/lib/drift-monitor.ts

Detects silent model degradation — e.g. after a provider-side model update — without requiring explicit user feedback.

Human-in-the-Loop (HITL)

Governance

Structured approval checkpoint between autonomous agent steps. Pauses execution after high-stakes decisions, surfaces a review payload to the operator, and resumes only on explicit approval.

Input: Agent action proposal + confidence score + context snapshot
Output: Approval decision (approve / reject / modify) + audit log entry
src/app/demos/multi-agent/page.tsx
Used in:multi-agent

The industry-standard pattern for keeping humans in control of agentic systems in regulated industries (finance, healthcare, legal).

Agent Planning

Orchestration

Decomposes a high-level user goal into an ordered sequence of sub-tasks, assigns each to the appropriate specialist agent, and manages execution order with dependency awareness.

Input: User goal string + available agent registry
Output: Ordered execution plan with agent assignments and fallback paths
src/app/demos/multi-agent/page.tsx

Separates "what to do" from "how to do it" — enabling the orchestration layer to be model-agnostic and swappable.

Architecture Layer

Model (Claude / OpenAI / Gemma)
    ↓
Agent Orchestrator
    ↓
Skills Layer  ← You are here
    ↓
Tool Gateway
    ↓
User Interface