Voice AI, Conversational Agents, AI/ML

Completed

Conversational AI Agent for Healthcare Client

We built and integrated a high-performance custom TTS pipeline using our proprietary system, LLaMA LLM, and voice cloning into the client's voice agents to power real-time medical checkups.

Client

Healthcare Client

Duration

6 months

Team Size

4 engineers

Completed

September 2025

Project Overview

A robust, scalable TTS solution powering medical voice agents using cloned voices and real-time audio streaming, designed to replace commercial APIs with faster, on-prem AI inference.

The Challenge

The client needed a customizable, privacy-compliant voice stack with rapid response time and domain-specific language support for healthcare agents.

Our Solution

We deployed a custom TTS model on GKE with Groq inference, used llama-3.3-70b-versatile for intelligent voice correction, integrated Deepgram for STT, and cloned 3 unique voices using our voice training pipeline.

Key Features

⚡

Ultra Low-Latency TTS

Optimized inference path with Groq and GKE-based streaming to consistently stay under 300ms latency.

🗣️

Voice Cloning & Personalization

Cloned 3 distinct brand voices with emotional and tonal control suited for healthcare use cases.

🤖

End-to-End Integration with Voice Agents

Real-time integration with Livekit and FastAPI for conversational AI capabilities in patient-facing applications.

Technical Challenges & Solutions

Latency Bottlenecks

Off-the-shelf APIs were too slow for real-time interaction.

Solution: Used customized Sglang for inference acceleration and WebSockets for streaming.

Voice Consistency Across Sessions

Maintaining naturalness and identity of cloned voices across different sessions.

Solution: Trained correction layers using LLaMA and feedback from STT alignment.

Scalability for Concurrent Sessions

Need to support multiple simultaneous agents and users.

Solution: Deployed auto-scaling GKE pods with load balancing and health checks.

Project Timeline

Setup & Infrastructure Planning

1 week

Provisioned GKE, Groq, CI/CD pipelines and cloud observability stack.

Terraform scripts

Cluster setup

Cloud Monitoring

TTS Pipeline Development

2 weeks

Built custom TTS pipeline with Sglang integration and LLaMA-based correction.

Custom TTS model

Sglang inference layer

LLaMA correction module

Model Deployment & Voice Cloning

2 weeks

Deployed the TTS model, trained and validated 5 custom voices.

TTS Model deployment on GKE

Cloned voice samples

Latency benchmarking

Integration & Optimization

2 weeks (ongoing)

Integrated with client voice agents, added feedback loop via STT.

WebSocket streaming APIs

Livekit voice agent interface

STT-integrated feedback layer

Key Results

< 300ms

Latency

Latency reduced to less than 300ms

5 cloned voices

Voice Models

Custom-trained voices integrated into live agents

Real-time

Inference Speed

Streaming inference using WebSockets + Groq

Technologies Used

PyTorch

LLaMA

Sglang

Deepgram STT

FastAPI

WebSockets

Livekit

Docker

GKE

Terraform

Groq

Before vs After

Latency

1.2s (using ElevenLabs)→ <300ms (our TTS)

Customization

Limited to existing voice catalog→ Fully custom cloned voices

Deployment

Third-party cloud dependent→ On GKE with full control & observability

Interested in Our Work?

Let's discuss how we can build a similar solution for your business