← back
AI Portfolio Voice Agent

AI Portfolio Voice Agent

Live

Talk to the portfolio in any language

Started: May 7, 2026 · ~16 hours over 2 days

AI:Claude CodeGemini API
Stack:Next.js 16React 19TypeScriptAgora Conversational AIAgora RTCGemini 3.1 Flash LiveUpstash RedisVercel

A real-time voice docent built into this portfolio. Click the mic, speak in any language, and an AI agent describes Alex's projects, opens them on demand, and offers to walk you through his career arc — all over a single multimodal channel with no separate STT or TTS. Ships with three layers of cost guardrails, private session analytics for post-launch debugging, and a drift-resistant QA harness that runs before every push.

// screens

AI Portfolio Voice Agent — screen 1
1 / 2

// highlights

  • Single-vendor multimodal voice (audio in / reasoning / audio out) via Gemini 3.1 Flash Live — no separate STT or TTS pipeline
  • Belt-and-suspenders tool calling: real function_calls plus a six-layer transcript-pattern fallback (commitment + order-aware suppression + commitment-text scoping + writeup-vs-domain + anaphora + booking-before-LinkedIn priority)
  • Three-layer Upstash guardrails — single-session lock, per-IP rate limit, daily budget kill switch with refund on early end
  • Private session analytics — every transcript turn and tool call logs to Upstash with a 30-day TTL, so production failures get diagnosed in seconds rather than guessed at
  • Hover-toast pulse trigger UI that stays out of the way until a visitor leans in; tool-failure toast surfaces real error messages above the in-session UI when a tool returns ok:false
  • Drift-resistant QA harness extracts regex patterns from source — 46 cases run on every preship, each one a verbatim production transcript that broke an earlier version
  • Reverse-engineered Agora's chunked datastream wire format from runtime logs to recover early-arriving messages

// takeaways

  • Preview-tier multimodal models will narrate function calls without emitting them — always plan a transcript-pattern fallback before relying on tool-call reliability
  • Distinguishing 'tell me about X' from 'show me X' is the core voice-UX problem, not a side detail. The agent should be a docent that offers, not a teleporter that yanks
  • Build private analytics on day one. Reading one real session transcript beats hours of speculation. The diagnostic CLI we built turned every 'it didn't work' report into a 30-second answer.
  • Anaphora and past-tense narration are the dominant production failure modes for project navigation, and they only show up after real users use the agent. The QA harness needs verbatim production transcripts as cases, not invented ones.
  • Multimodal Live > pipeline (STT+LLM+TTS) for latency and vendor count, but the trade-off is tool-call reliability. Pick the architecture based on whether your UX needs sub-second turns or rock-solid function-calling

Alexander Lee · AiPM · 2026

built with claude code + next.js