TL;DR
Built a fully proprietary real-time conversational AI avatar system from scratch, outperforming HeyGen's capabilities at the time of development
Engineered a custom phoneme-to-viseme lip-sync pipeline on Azure Cognitive Services, vendor-agnostic, audio-in to facial-expression-out, enabling lifelike speech animation for cartoon avatars
Deployed across multiple Alpha School products including AskElle and DreamLauncher, with a live interactive demo at personas.alpha.school featuring selectable historical figure personas
The Challenge
Alpha School runs on a radical premise: students spend just two hours per day on AI-driven core instruction, then own the rest of their time for passion projects, physical activity, and self-directed learning. To make that model work, the AI doing the teaching has to be extraordinary. It can't feel like a chatbot reading from a script. It has to feel like a tutor who knows the student, responds naturally, and keeps them engaged.
Existing avatar solutions weren't up to the task. HeyGen and similar platforms offered pre-rendered video loops with limited interactivity. They couldn't hold a real conversation, adapt to a student's current emotional state, or respond dynamically to what was happening in a lesson. For Alpha's vision, AI tutors that millions of students would interact with daily, these tools were a dead end.
Alpha needed a fully custom, real-time conversational avatar system. One that could be integrated into any product across their ecosystem, support thousands of simultaneous student sessions, and deliver the kind of lifelike, responsive interaction that makes students forget they're talking to software.
The technical bar was high. Real-time lip-sync for cartoon avatars is a hard problem. Natural-sounding, emotionally expressive AI voice is a hard problem. Building all of it into a scalable, multi-product platform, while shipping fast enough to keep pace with Alpha's weekly release cadence, made it harder still.
Key Results
Outperformed HeyGen on real-time interactivity at time of build
Supports thousands of simultaneous avatar sessions
Multi-language and multi-resolution support across all devices
Live across AskElle and DreamLauncher with full educational context integration
Vendor-agnostic lip-sync pipeline enabling seamless TTS provider migration