Entropik Labs Logo
Voice AI, Conversational Agents, AI/ML
Completed

Conversational AI Agent for Healthcare Client

We built and integrated a high-performance custom TTS pipeline using our proprietary system, LLaMA LLM, and voice cloning into the client's voice agents to power real-time medical checkups.

Client
Healthcare Client
Duration
6 months
Team Size
4 engineers
Completed
September 2025

Project Overview

A robust, scalable TTS solution powering medical voice agents using cloned voices and real-time audio streaming, designed to replace commercial APIs with faster, on-prem AI inference.

The Challenge

The client needed a customizable, privacy-compliant voice stack with rapid response time and domain-specific language support for healthcare agents.

Our Solution

We deployed a custom TTS model on GKE with Groq inference, used llama-3.3-70b-versatile for intelligent voice correction, integrated Deepgram for STT, and cloned 3 unique voices using our voice training pipeline.

Key Features

Ultra Low-Latency TTS

Optimized inference path with Groq and GKE-based streaming to consistently stay under 300ms latency.

🗣️

Voice Cloning & Personalization

Cloned 3 distinct brand voices with emotional and tonal control suited for healthcare use cases.

🤖

End-to-End Integration with Voice Agents

Real-time integration with Livekit and FastAPI for conversational AI capabilities in patient-facing applications.

Technical Challenges & Solutions

Latency Bottlenecks

Off-the-shelf APIs were too slow for real-time interaction.

Solution: Used customized Sglang for inference acceleration and WebSockets for streaming.

Voice Consistency Across Sessions

Maintaining naturalness and identity of cloned voices across different sessions.

Solution: Trained correction layers using LLaMA and feedback from STT alignment.

Scalability for Concurrent Sessions

Need to support multiple simultaneous agents and users.

Solution: Deployed auto-scaling GKE pods with load balancing and health checks.

Project Timeline

1

Setup & Infrastructure Planning

1 week

Provisioned GKE, Groq, CI/CD pipelines and cloud observability stack.

Terraform scripts
Cluster setup
Cloud Monitoring
2

TTS Pipeline Development

2 weeks

Built custom TTS pipeline with Sglang integration and LLaMA-based correction.

Custom TTS model
Sglang inference layer
LLaMA correction module
3

Model Deployment & Voice Cloning

2 weeks

Deployed the TTS model, trained and validated 5 custom voices.

TTS Model deployment on GKE
Cloned voice samples
Latency benchmarking
4

Integration & Optimization

2 weeks (ongoing)

Integrated with client voice agents, added feedback loop via STT.

WebSocket streaming APIs
Livekit voice agent interface
STT-integrated feedback layer

Key Results

< 300ms
Latency
Latency reduced to less than 300ms
5 cloned voices
Voice Models
Custom-trained voices integrated into live agents
Real-time
Inference Speed
Streaming inference using WebSockets + Groq

Technologies Used

PyTorch
LLaMA
Sglang
Deepgram STT
FastAPI
WebSockets
Livekit
Docker
GKE
Terraform
Groq

Before vs After

Latency
1.2s (using ElevenLabs) <300ms (our TTS)
Customization
Limited to existing voice catalog Fully custom cloned voices
Deployment
Third-party cloud dependent On GKE with full control & observability

Interested in Our Work?

Let's discuss how we can build a similar solution for your business