Overview

NavTalk Real-time Digital Human API enables you to build interactive conversational experiences with digital avatars that respond in real-time through WebSocket connections. With sub-500ms latency and seamless interruption handling, create natural, human-like interactions that feel truly conversational.

Core Features

Ultra-Low Latency

Sub-500ms end-to-end response times for real-time conversations

Real-time Rendering

Frame-accurate lip sync and emotion-driven facial expressions

Natural Conversations

Human-like dialogue with emotional intelligence and empathetic responses

Multilingual Support

Over 50 languages with 95%+ recognition accuracy, seamless language switching

Context Management

Maintain conversation history and context across sessions

Preset Characters

Ready-to-use digital avatars for various use cases and industries

Custom Characters

Create and deploy your own custom digital character avatars

Knowledge Base Integration

Connect enterprise or personal knowledge bases for expert-level, context-aware responses

Function Calling

Integrate external APIs and execute custom functions during conversations

Unified WebSocket Architecture

NavTalk uses a single unified WebSocket connection that handles all real-time communication, including:

Real-time API communication - Audio input streaming and text/audio responses
WebRTC signaling - Video stream setup and ICE candidate exchange
Session management - Configuration and conversation history

This unified approach provides several key advantages:

Simplified Integration

Only one WebSocket connection to manage, reducing complexity and potential connection issues. No need to coordinate multiple connections or handle separate WebRTC signaling channels.

Better Performance

Reduced connection overhead and improved reliability with a single persistent connection. All communication flows through one optimized channel.

Easier Debugging

All events and messages flow through a single connection, making it easier to monitor, log, and debug issues in your application.

Automatic Synchronization

WebRTC signaling is automatically synchronized with the audio stream, eliminating timing issues and ensuring smooth audio-video synchronization.

How It Works

The Real-time Digital Human API uses a direct audio-to-audio processing pipeline that eliminates traditional text conversion steps (STT and TTS), delivering unprecedented speed and natural conversation flow.

1. Audio Input

Your application captures user audio input and sends audio streams through WebSocket connections. This layer handles real-time audio streaming and ensures continuous bidirectional communication for seamless dialogue.

2. Direct Processing

GPT-realtime processes audio signals directly without text conversion steps. By eliminating Speech-to-Text (STT) and Text-to-Speech (TTS) transformations, the system achieves sub-500ms latency and enables natural interruption handling.

3. Audio Output

The processed audio response is generated in real-time and synchronized with video rendering. This layer delivers high-quality audio output with preserved fidelity, maintaining natural voice tone and emotional nuances.

4. Visual Rendering

Frame-accurate lip sync and emotion-driven facial expressions are rendered in real-time, creating a lifelike visual presence synchronized with the audio output.

Try Our Demo

We provide simple, single-page demos in multiple languages and platforms that you can clone and run with one click. To get started:

Register for an account and obtain your API key from the dashboard
Clone the Samples repository: git clone https://github.com/navtalk/Samples.git
Configure your API key in the demo files
Run the demo — each demo is a single-page application that works immediately

The Samples repository includes ready-to-run examples for Web, Python, JavaScript, and other platforms to help you get started quickly.

Getting Started

Real-time Digital Human API

Custom Avatar Training

Video Synthesis API

Resources

Core Features

Ultra-Low Latency

Real-time Rendering

Natural Conversations

Multilingual Support

Context Management

Preset Characters

Custom Characters

Knowledge Base Integration

Function Calling

Unified WebSocket Architecture

How It Works

Try Our Demo

Getting Started

Real-time Digital Human API

Custom Avatar Training

Video Synthesis API

Resources

​Core Features

Ultra-Low Latency

Real-time Rendering

Natural Conversations

Multilingual Support

Context Management

Preset Characters

Custom Characters

Knowledge Base Integration

Function Calling

​Unified WebSocket Architecture

​How It Works

​Try Our Demo

Core Features

Unified WebSocket Architecture

How It Works

Try Our Demo