Conversational AI Agent for Healthcare Client
We built and integrated a high-performance custom TTS pipeline using our proprietary system, LLaMA LLM, and voice cloning into the client's voice agents to power real-time medical checkups.
Project Overview
A robust, scalable TTS solution powering medical voice agents using cloned voices and real-time audio streaming, designed to replace commercial APIs with faster, on-prem AI inference.
The Challenge
The client needed a customizable, privacy-compliant voice stack with rapid response time and domain-specific language support for healthcare agents.
Our Solution
We deployed a custom TTS model on GKE with Groq inference, used llama-3.3-70b-versatile for intelligent voice correction, integrated Deepgram for STT, and cloned 3 unique voices using our voice training pipeline.
Key Features
Ultra Low-Latency TTS
Optimized inference path with Groq and GKE-based streaming to consistently stay under 300ms latency.
Voice Cloning & Personalization
Cloned 3 distinct brand voices with emotional and tonal control suited for healthcare use cases.
End-to-End Integration with Voice Agents
Real-time integration with Livekit and FastAPI for conversational AI capabilities in patient-facing applications.
Technical Challenges & Solutions
Latency Bottlenecks
Off-the-shelf APIs were too slow for real-time interaction.
Solution: Used customized Sglang for inference acceleration and WebSockets for streaming.
Voice Consistency Across Sessions
Maintaining naturalness and identity of cloned voices across different sessions.
Solution: Trained correction layers using LLaMA and feedback from STT alignment.
Scalability for Concurrent Sessions
Need to support multiple simultaneous agents and users.
Solution: Deployed auto-scaling GKE pods with load balancing and health checks.
Project Timeline
Setup & Infrastructure Planning
Provisioned GKE, Groq, CI/CD pipelines and cloud observability stack.
TTS Pipeline Development
Built custom TTS pipeline with Sglang integration and LLaMA-based correction.
Model Deployment & Voice Cloning
Deployed the TTS model, trained and validated 5 custom voices.
Integration & Optimization
Integrated with client voice agents, added feedback loop via STT.