Skip to main content

Account and Authentication

Please visit the console at console.navtalk.ai. After registering and logging in, you can generate your License Key on the “API Key Management” page.
The License is valid indefinitely. If you believe it has been compromised, you can reset it immediately in the console.

Quick Start Questions

Please refer to the “API Call Overview” section in the documentation. You just need to provide video_url and audio_url to generate the video. The example response format is:
{
  "status": "started",
  "task_id": "xxxxx"
}
  1. Establish a WebSocket connection to wss://transfer.navtalk.ai/wss/v2/realtime-chat (include the license and name parameters in the URL).
  2. Wait for conversation.connected.success event containing session ID and ICE servers.
  3. Optionally send conversation history via conversation.item.create messages.
  4. Capture microphone audio and send it via realtime.input_audio_buffer.append.
  5. Receive AI response text/audio stream/video stream (WebRTC through the same connection).
You can download our complete code and run it directly.

Real-time WebSocket Connection Issues

Please check:
  • Is the license valid?
  • Is the WebSocket address correct: wss://api.navtalk.ai/realtime-api?
  • Does Chrome allow microphone access?
Yes, WebRTC is the only method for displaying video. Please ensure that after connecting to the WebSocket, you simultaneously establish a WebRTC video channel and bind it to the video tag to play.

Character and Behavior Settings

Please set this in the prompt field of the realtime.input_config message, for example:
socket.onopen = () => {
  const config = {
    voice: 'cedar',
    prompt: `You are a gentle psychological counselor.
Please respond in zh-CN.
Please greet with "Hello, I am your intelligent assistant."`,
    tools: []  // Optional: Function calling tools (OpenAI models only)
  };
  
  socket.send(JSON.stringify({
    type: 'realtime.input_config',
    data: { content: JSON.stringify(config) }
  }));
};
Yes, you can. Set it using voice: "nova", which supports the following 9 tones: alloy, shimmer, coral, echo, ballad, ash, sage, verse.See Voice Styles for complete descriptions and audio previews.

Context and Memory Issues

Two methods are supported:
  • Embed conversation context in the prompt field of realtime.input_config to simulate full context.
  • Use conversation.item.create to send historical messages (only supports user messages) after receiving the realtime.session.created event.
Please confirm:
  • Does your realtime.input_config message include contextual content in the prompt field?
  • Did you send conversation history using conversation.item.create after receiving realtime.session.created?

Function Call Issues

  • Please confirm that the tools parameter has been correctly registered.
  • Check if you are listening for the response.function_call_arguments.done event.
  • Is the backend correctly returning function_call_output?
Please ensure to execute the following after sending the result:
socket.send({ type: "response.create" })

Media Interface Call Issues

Generally, it can be completed within 5 to 30 seconds. Please regularly poll the query_status interface until you receive:
{
  "status": "done",
  "video_url": "xxx"
}
It is recommended to upload audio and video files to a public cloud and use the URL for the call. If you need to use the platform’s upload feature, please log in to the console to get the upload link.