WebSocket Connection

WebSocket connections are used to establish connection and send user audio data to the NavTalk API for processing. This is the primary channel for transmitting your audio input to the digital human system. The complete connection process involves one unified WebSocket connection that handles:

Real-time API communication - for sending audio input and receiving text/audio responses
WebRTC signaling - for establishing video stream (WebRTC signaling messages are sent through the same WebSocket connection)

Step 1: Establish WebSocket Connection

First, establish a unified WebSocket connection to the NavTalk API with your API key and character name. This single connection will be used for all communication including audio data, text/audio responses, and WebRTC signaling.

// Define event type constants
const NavTalkMessageType = Object.freeze({
    CONNECTED_SUCCESS: "conversation.connected.success",
    CONNECTED_FAIL: "conversation.connected.fail",
    CONNECTED_CLOSE: "conversation.connected.close",
    INSUFFICIENT_BALANCE: "conversation.connected.insufficient_balance",
    CONNECTED_WARNING: "conversation.connected.warning",
    WEB_RTC_OFFER: "webrtc.signaling.offer",
    WEB_RTC_ANSWER: "webrtc.signaling.answer",
    WEB_RTC_ICE_CANDIDATE: "webrtc.signaling.iceCandidate",
    REALTIME_SESSION_CREATED: "realtime.session.created",
    REALTIME_SESSION_UPDATED: "realtime.session.updated",
    REALTIME_SPEECH_STARTED: "realtime.input_audio_buffer.speech_started",
    REALTIME_SPEECH_STOPPED: "realtime.input_audio_buffer.speech_stopped",
    REALTIME_CONVERSATION_ITEM_COMPLETED: "realtime.conversation.item.input_audio_transcription.completed",
    REALTIME_RESPONSE_AUDIO_TRANSCRIPT_DELTA: "realtime.response.audio_transcript.delta",
    REALTIME_RESPONSE_AUDIO_DELTA: "realtime.response.audio.delta",
    REALTIME_RESPONSE_AUDIO_TRANSCRIPT_DONE: "realtime.response.audio_transcript.done",
    REALTIME_RESPONSE_AUDIO_DONE: "realtime.response.audio.done",
    REALTIME_RESPONSE_FUNCTION_CALL_ARGUMENTS_DONE: "realtime.response.function_call_arguments.done",
    REALTIME_INPUT_AUDIO_BUFFER_APPEND: "realtime.input_audio_buffer.append",
    REALTIME_INPUT_TEXT: "realtime.input_text",
    REALTIME_INPUT_IMAGE: "realtime.input_image"
});

// Configuration
const LICENSE = 'your-api-key-here';
const CHARACTER_NAME = 'navtalk.Leo';
// Optional: Use avatarId for precise avatar lookup
const AVATAR_ID = 'your-avatar-id'; // Get from avatar list API

// Build WebSocket URL with query parameters
const websocketUrl = 'wss://transfer.navtalk.ai/wss/v2/realtime-chat';

// Option 1: Connect using character name
const websocketUrlWithParams = `${websocketUrl}?license=${encodeURIComponent(LICENSE)}&name=${encodeURIComponent(CHARACTER_NAME)}`;

// Option 2: Connect using avatarId (recommended, higher priority)
// const websocketUrlWithParams = `${websocketUrl}?license=${encodeURIComponent(LICENSE)}&avatarId=${encodeURIComponent(AVATAR_ID)}`;

// Create WebSocket connection
const socket = new WebSocket(websocketUrlWithParams);
// Important: set binary type to 'arraybuffer' for handling binary audio data
socket.binaryType = 'arraybuffer';

// Connection event handlers
socket.onopen = () => {
  console.log('WebSocket connection established');
};

socket.onerror = (error) => {
  console.error('WebSocket error:', error);
  // Handle connection errors
};

socket.onclose = (event) => {
  console.log('WebSocket connection closed', event.code, event.reason);
  // Handle connection closure
  // Common reasons: 'Insufficient points', normal closure, etc.
};

The WebSocket connection URL requires one mandatory parameter and supports two query methods:

license: Your API key (required)
name: The name of the digital human character (query method 1)
avatarId: Direct avatar ID for precise lookup (query method 2, higher priority)

Query Priority: If both avatarId and name are provided, avatarId takes precedence.Multiple Avatars Warning: If using name query and multiple avatars share the same name, the system will:

Automatically select the most recently updated avatar
Send a conversation.connected.warning event with the selected avatarId immediately after the connection success event

This unified connection handles both real-time API communication and WebRTC signaling, eliminating the need for a separate WebRTC WebSocket connection.

Step 2: Configure Session and Handle Session Events

After the WebSocket connection is established, the server will automatically send realtime.session.created and realtime.session.updated events. After receiving realtime.session.created, send conversation history (if any). Once you receive realtime.session.updated, you can start sending audio data.

// Global variables for session management
let socket; // WebSocket connection
let conversationHistory = []; // Optional: store conversation history

socket.onmessage = (event) => {
  // Handle both string (JSON) and binary messages
  if (typeof event.data === 'string') {
    try {
      const data = JSON.parse(event.data);
      handleReceivedMessage(data);
    } catch (e) {
      console.error('Failed to parse JSON message:', e);
    }
  } else if (event.data instanceof ArrayBuffer) {
    // Handle binary audio data if needed
    handleReceivedBinaryMessage(event.data);
  }
};

async function handleReceivedMessage(data) {
  // Extract data from nested structure
  const nav_data = data.data;
  
  switch (data.type) {
    // Connection events
    case NavTalkMessageType.CONNECTED_SUCCESS:
      console.log('WebSocket connection successful');
      // Extract sessionId and iceServers from data
      if (nav_data && nav_data.sessionId) {
        const sessionId = nav_data.sessionId;
        console.log('Received session ID:', sessionId);
        // Store sessionId for WebRTC signaling (see Step 5)
        // Store iceServers for WebRTC configuration
        if (nav_data.iceServers) {
          configuration.iceServers = nav_data.iceServers;
          console.log('ICE servers configured:', configuration.iceServers);
        }
      }
      break;
    
    case NavTalkMessageType.CONNECTED_FAIL:
      const errorMessage = data.message || 'Unknown error';
      console.error('Connection failed:', errorMessage);
      // Handle connection failure
      break;
    
    case NavTalkMessageType.CONNECTED_CLOSE:
      console.log('Connection closed:', data.message || 'Normal closure');
      // Handle connection closure
      break;
    
    case NavTalkMessageType.INSUFFICIENT_BALANCE:
      console.error('Insufficient balance, service has stopped, please recharge!');
      // Handle insufficient balance
      break;
    
    case NavTalkMessageType.CONNECTED_WARNING:
      const warningMsg = data.message || 'Connection warning';
      console.warn('Connection warning:', warningMsg);
      // Display warning to user (e.g., toast notification)
      break;
    
    // Step 2.1: Session created - send conversation history
    case NavTalkMessageType.REALTIME_SESSION_CREATED:
      console.log('Session created, sending conversation history.');
      await sendSessionUpdate();
      break;
    
    // Step 2.2: Session updated - ready to send audio
    case NavTalkMessageType.REALTIME_SESSION_UPDATED:
      console.log('Session updated. Ready to receive audio.');
      // Now you can start recording and sending audio
      startRecording();
      break;

// Send conversation history after session is created
async function sendSessionUpdate() {
  // Get conversation history from storage
  const history = localStorage.getItem('realtimeChatHistory');
  const conversationHistory = history ? JSON.parse(history) : [];
  
  // Send each item in history
  // Note: Only send user messages, assistant messages are handled by the server
  conversationHistory.forEach((msg) => {
    if (msg.role === 'user') {
      const messageConfig = {
        type: 'conversation.item.create',
        item: {
          type: 'message',
          role: msg.role,
          content: [
            {
              type: 'input_text',
              text: msg.content
            }
          ]
        }
      };
      
      try {
        socket.send(JSON.stringify(messageConfig));
        console.log('Sent history message:', msg.role);
      } catch (e) {
        console.error('Error sending history message:', e);
      }
    }
  });
}

Session configuration (voice, prompt, tools) can be set via the Console interface or API configuration. The sendSessionUpdate() function is used to send conversation history after receiving the realtime.session.created event.The conversation history allows the AI to maintain context across sessions. Only user messages need to be sent; assistant messages are handled by the server.

Step 3: Capture and Send Audio

Once you receive the realtime.session.updated event, you can start capturing audio from the user’s microphone and sending it through the WebSocket connection. The audio must be in PCM16 format at 24kHz sample rate, mono channel.

// Global variables for audio processing
let audioContext;
let audioProcessor;
let audioStream;

function startRecording() {
  // Request microphone access
  navigator.mediaDevices.getUserMedia({ audio: true })
    .then(stream => {
      // Create AudioContext with 24kHz sample rate (required by API)
      audioContext = new (window.AudioContext || window.webkitAudioContext)({ 
        sampleRate: 24000 
      });
      audioStream = stream;
      
      // Create audio source from microphone stream
      const source = audioContext.createMediaStreamSource(stream);
      
      // Create ScriptProcessorNode to process audio chunks
      // Parameters: bufferSize (8192), inputChannels (1), outputChannels (1)
      audioProcessor = audioContext.createScriptProcessor(8192, 1, 1);
      
      // Process audio data in real-time
      audioProcessor.onaudioprocess = (event) => {
        // Only send if WebSocket is open
        if (socket && socket.readyState === WebSocket.OPEN) {
          // Get audio data from input buffer (Float32Array, range -1.0 to 1.0)
          const inputBuffer = event.inputBuffer.getChannelData(0);
          
          // Step 3.1: Convert Float32Array to PCM16 (16-bit signed integers)
          const pcmData = floatTo16BitPCM(inputBuffer);
          
          // Step 3.2: Encode PCM16 data to base64 for JSON transmission
          const base64PCM = base64EncodeAudio(new Uint8Array(pcmData));
          
          // Step 3.3: Send audio chunks (split large base64 strings into smaller chunks)
          // Chunk size: 4096 characters to avoid message size limits
          const chunkSize = 4096;
          for (let i = 0; i < base64PCM.length; i += chunkSize) {
            const chunk = base64PCM.slice(i, i + chunkSize);
            
            // Send audio chunk as JSON message
            socket.send(JSON.stringify({
              type: NavTalkMessageType.REALTIME_INPUT_AUDIO_BUFFER_APPEND,
              data: { audio: chunk }
            }));
          }
        }
      };
      
      // Connect audio processing chain
      source.connect(audioProcessor);
      audioProcessor.connect(audioContext.destination);
      
      console.log('Recording started');
    })
    .catch(error => {
      console.error('Unable to access microphone:', error);
      // Handle microphone access errors
    });
}

// Helper function: Convert Float32Array to 16-bit PCM
// Input: Float32Array with values in range [-1.0, 1.0]
// Output: ArrayBuffer containing 16-bit PCM data
function floatTo16BitPCM(float32Array) {
  const buffer = new ArrayBuffer(float32Array.length * 2); // 2 bytes per sample
  const view = new DataView(buffer);
  let offset = 0;
  
  for (let i = 0; i < float32Array.length; i++, offset += 2) {
    // Clamp value to [-1, 1] range
    let s = Math.max(-1, Math.min(1, float32Array[i]));
    
    // Convert to 16-bit signed integer
    // Negative values: s * 0x8000 (range: -32768 to 0)
    // Positive values: s * 0x7FFF (range: 0 to 32767)
    view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
  }
  
  return buffer;
}

// Helper function: Encode audio to base64
// Input: Uint8Array of PCM16 data
// Output: Base64-encoded string
function base64EncodeAudio(uint8Array) {
  let binary = '';
  const chunkSize = 0x8000; // 32KB chunks to avoid stack overflow
  
  // Convert binary data to string (chunk by chunk)
  for (let i = 0; i < uint8Array.length; i += chunkSize) {
    const chunk = uint8Array.subarray(i, i + chunkSize);
    binary += String.fromCharCode.apply(null, chunk);
  }
  
  // Encode to base64
  return btoa(binary);
}

// Stop recording and cleanup
function stopRecording() {
  // Disconnect audio processor
  if (audioProcessor) {
    audioProcessor.disconnect();
    audioProcessor = null;
  }
  
  // Stop all audio tracks
  if (audioStream) {
    audioStream.getTracks().forEach(track => track.stop());
    audioStream = null;
  }
  
  // Close WebSocket connection
  if (socket) {
    socket.close();
  }
  
  console.log('Recording stopped');
}

Critical Audio Requirements:

Format: PCM16 (16-bit signed integers)
Sample Rate: 24kHz (24000 Hz)
Channels: Mono (1 channel)
Encoding: Base64 for JSON transmission

Make sure your audio processing matches these specifications exactly, otherwise the API may reject the audio data.

Step 4: Handle Response Messages

Process incoming messages from the API. The WebSocket connection will send various event types including transcriptions, AI responses, and status updates.

When the WebSocket connection is established, you will receive a conversation.connected.success event. This event contains:

data.sessionId: The session ID you must use for WebRTC signaling
data.iceServers: ICE server configuration for WebRTC

Capture the sessionId value as soon as it arrives because you must reuse it later for WebRTC signaling.

// Enhanced message handler with all event types
async function handleReceivedMessage(data) {
  // Extract data from nested structure
  const nav_data = data.data;
  
  switch (data.type) {
    // Connection events
    case NavTalkMessageType.CONNECTED_SUCCESS:
      console.log('WebSocket connection successful');
      // Extract sessionId and iceServers from data
      if (nav_data && nav_data.sessionId) {
        const sessionId = nav_data.sessionId;
        console.log('Received session ID:', sessionId);
        // Store sessionId for WebRTC signaling (see Step 5)
        // Store iceServers for WebRTC configuration
        if (nav_data.iceServers) {
          configuration.iceServers = nav_data.iceServers;
          console.log('ICE servers configured:', configuration.iceServers);
        }
      }
      break;
    
    case NavTalkMessageType.CONNECTED_FAIL:
      const errorMessage = data.message || 'Unknown error';
      console.error('Connection failed:', errorMessage);
      break;
    
    case NavTalkMessageType.CONNECTED_CLOSE:
      console.log('Connection closed:', data.message || 'Normal closure');
      break;
    
    case NavTalkMessageType.INSUFFICIENT_BALANCE:
      console.error('Insufficient balance, service has stopped, please recharge!');
      break;
    
    case NavTalkMessageType.CONNECTED_WARNING:
      const warningMsg = data.message || 'Connection warning';
      console.warn('Connection warning:', warningMsg);
      // Display warning to user (e.g., toast notification)
      break;
    
    // Session events (handled in Step 2)
    case NavTalkMessageType.REALTIME_SESSION_CREATED:
      await sendSessionUpdate();
      break;
    
    case NavTalkMessageType.REALTIME_SESSION_UPDATED:
      console.log('Session updated. Ready to receive audio.');
      startRecording();
      break;
    
    // User speech detection events
    case NavTalkMessageType.REALTIME_SPEECH_STARTED:
      console.log('Speech started detected by server.');
      // User started speaking - you might want to stop any current audio playback
      stopCurrentAudioPlayback();
      break;
    
    case NavTalkMessageType.REALTIME_SPEECH_STOPPED:
      console.log('Speech stopped detected by server.');
      // User stopped speaking - server will process the audio
      break;
    
    // User transcription events
    case NavTalkMessageType.REALTIME_CONVERSATION_ITEM_COMPLETED:
      const transcript = nav_data.content;
      console.log('Received transcription:', transcript);
      // Display user message in UI
      displayUserMessage(transcript);
      // Save to conversation history
      await saveToHistory('user', transcript);
      break;
    
    // AI response text stream (streaming)
    case NavTalkMessageType.REALTIME_RESPONSE_AUDIO_TRANSCRIPT_DELTA:
      // This event fires multiple times as the AI generates text
      const deltaTranscript = nav_data.content; // Incremental text chunk
      const responseId = nav_data.id; // Unique ID for this response
      
      // Accumulate text chunks (you may need to buffer incomplete content)
      accumulateResponseText(responseId, deltaTranscript);
      
      // Display streaming text in UI
      displayAIResponseStream(responseId, deltaTranscript);
      break;
    
    // AI response text completed
    case NavTalkMessageType.REALTIME_RESPONSE_AUDIO_TRANSCRIPT_DONE:
      const finalTranscript = nav_data.content;
      console.log('AI response complete:', finalTranscript);
      // Display final text
      displayAIResponseComplete(finalTranscript);
      // Save to conversation history
      await saveToHistory('assistant', finalTranscript);
      break;
    
    // AI response audio stream (if you're receiving audio via WebSocket)
    case NavTalkMessageType.REALTIME_RESPONSE_AUDIO_DELTA:
      // Handle audio delta if needed
      if (nav_data.delta) {
        // Process audio chunk
        handleAudioChunk(nav_data.delta);
      }
      break;
    
    // AI response audio completed
    case NavTalkMessageType.REALTIME_RESPONSE_AUDIO_DONE:
      console.log('Audio response complete.');
      // Handle audio completion
      break;
    
    // Function call events
    case NavTalkMessageType.REALTIME_RESPONSE_FUNCTION_CALL_ARGUMENTS_DONE:
      console.log('Function call received:', nav_data);
      await handleFunctionCall(nav_data);
      break;
    
    // WebRTC signaling events (handled in Step 5)
    case NavTalkMessageType.WEB_RTC_OFFER:
      handleOffer(nav_data);
      break;
    
    case NavTalkMessageType.WEB_RTC_ANSWER:
      handleAnswer(nav_data);
      break;
    
    case NavTalkMessageType.WEB_RTC_ICE_CANDIDATE:
      handleIceCandidate(nav_data);
      break;
    
    // Error events
    case NavTalkMessageType.INSUFFICIENT_BALANCE:
      console.error('Insufficient balance, service has stopped, please recharge!');
      break;
    
    default:
      console.warn('Unhandled event type:', data.type);
  }
}

// Helper function to accumulate streaming text
let responseBuffers = new Map();

function accumulateResponseText(responseId, delta) {
  if (!responseBuffers.has(responseId)) {
    responseBuffers.set(responseId, '');
  }
  const current = responseBuffers.get(responseId);
  responseBuffers.set(responseId, current + delta);
}

// Handle binary messages (if any)
function handleReceivedBinaryMessage(arrayBuffer) {
  // Process binary audio data if needed
  console.log('Received binary message:', arrayBuffer.byteLength, 'bytes');
}

The API sends events in a specific sequence:

conversation.connected.success → Connection established, contains sessionId and iceServers for WebRTC
realtime.session.created → Send conversation history
realtime.session.updated → Start sending audio
realtime.input_audio_buffer.speech_started → User starts speaking
realtime.input_audio_buffer.speech_stopped → User stops speaking
realtime.conversation.item.input_audio_transcription.completed → User speech transcribed
realtime.response.audio_transcript.delta → AI response text (streaming, multiple events)
realtime.response.audio_transcript.done → AI response text complete
realtime.response.audio.done → AI response audio complete

Note: Event data may be nested in a data field. Always check both data.data and data when accessing event properties.

Step 5: Establish WebRTC Connection (for Video)

To receive the digital human’s video stream, WebRTC signaling messages are sent through the same unified WebSocket connection. This is covered in detail in the WebRTC Connection guide.

In the new unified API, WebRTC signaling (offer, answer, ICE candidates) is handled through the same WebSocket connection using event types:

webrtc.signaling.offer - Receive WebRTC offer
webrtc.signaling.answer - Send WebRTC answer
webrtc.signaling.iceCandidate - Exchange ICE candidates

No separate WebRTC WebSocket connection is needed.

Getting Started

Real-time Digital Human API

Custom Avatar Training

Video Synthesis API

Resources

WebSocket Connection