WebSocket Connection
WebSocket connections are used to establish connection and send user audio data to the NavTalk API for processing. This is the primary channel for transmitting your audio input to the digital human system. The complete connection process involves one unified WebSocket connection that handles:- Real-time API communication - for sending audio input and receiving text/audio responses
- WebRTC signaling - for establishing video stream (WebRTC signaling messages are sent through the same WebSocket connection)
Step 1: Establish WebSocket Connection
First, establish a unified WebSocket connection to the NavTalk API with your API key and character name. This single connection will be used for all communication including audio data, text/audio responses, and WebRTC signaling.
The WebSocket connection URL requires one mandatory parameter and supports two query methods:
license: Your API key (required)name: The name of the digital human character (query method 1)avatarId: Direct avatar ID for precise lookup (query method 2, higher priority)
avatarId and name are provided, avatarId takes precedence.Multiple Avatars Warning: If using name query and multiple avatars share the same name, the system will:- Automatically select the most recently updated avatar
- Send a
conversation.connected.warningevent with the selected avatarId immediately after the connection success event
Step 2: Configure Session and Handle Session Events
After the WebSocket connection is established, the server will automatically send
realtime.session.created and realtime.session.updated events. After receiving realtime.session.created, send conversation history (if any). Once you receive realtime.session.updated, you can start sending audio data.Session configuration (voice, prompt, tools) can be set via the Console interface or API configuration. The
sendSessionUpdate() function is used to send conversation history after receiving the realtime.session.created event.The conversation history allows the AI to maintain context across sessions. Only user messages need to be sent; assistant messages are handled by the server.Step 3: Capture and Send Audio
Once you receive the
realtime.session.updated event, you can start capturing audio from the user’s microphone and sending it through the WebSocket connection. The audio must be in PCM16 format at 24kHz sample rate, mono channel.Step 4: Handle Response Messages
Process incoming messages from the API. The WebSocket connection will send various event types including transcriptions, AI responses, and status updates.
When the WebSocket connection is established, you will receive a
conversation.connected.success event. This event contains:data.sessionId: The session ID you must use for WebRTC signalingdata.iceServers: ICE server configuration for WebRTC
sessionId value as soon as it arrives because you must reuse it later for WebRTC signaling.The API sends events in a specific sequence:
conversation.connected.success→ Connection established, containssessionIdandiceServersfor WebRTCrealtime.session.created→ Send conversation historyrealtime.session.updated→ Start sending audiorealtime.input_audio_buffer.speech_started→ User starts speakingrealtime.input_audio_buffer.speech_stopped→ User stops speakingrealtime.conversation.item.input_audio_transcription.completed→ User speech transcribedrealtime.response.audio_transcript.delta→ AI response text (streaming, multiple events)realtime.response.audio_transcript.done→ AI response text completerealtime.response.audio.done→ AI response audio complete
data field. Always check both data.data and data when accessing event properties.Step 5: Establish WebRTC Connection (for Video)
To receive the digital human’s video stream, WebRTC signaling messages are sent through the same unified WebSocket connection. This is covered in detail in the WebRTC Connection guide.
In the new unified API, WebRTC signaling (offer, answer, ICE candidates) is handled through the same WebSocket connection using event types:
webrtc.signaling.offer- Receive WebRTC offerwebrtc.signaling.answer- Send WebRTC answerwebrtc.signaling.iceCandidate- Exchange ICE candidates