Endpoint
All requests are sent to the same endpoint:API Call Overview
The following table provides a quick comparison of all 9 supported methods:| Method | Visual Source | Audio Source | Use Case |
|---|---|---|---|
| ① Video + Audio URL | Video | Audio URL | Re-dub existing videos |
| ② Video + Audio Base64 | Video | Audio Base64 | Re-dub with local audio |
| ③ Video + Text (TTS) | Video | Text | Re-dub with TTS |
| ④ Image + Text (TTS) | Image | Text | Create talking head from photo |
| ⑤ Image + Audio URL | Image | Audio URL | Sync image with audio |
| ⑥ Image + Audio Base64 | Image | Audio Base64 | Sync image with local audio |
| ⑦ Built-in Character + Audio URL | Built-in | Audio URL | Use preset character with audio |
| ⑧ Built-in Character + Audio Base64 | Built-in | Audio Base64 | Use preset character with local audio |
| ⑨ Built-in Character + Text (TTS) | Built-in | Text | Use preset character with TTS |
Detailed Examples
① Video + Audio URL
① Video + Audio URL
② Video + Audio Base64
② Video + Audio Base64
③ Video + Text (TTS)
③ Video + Text (TTS)
④ Image + Text (TTS)
④ Image + Text (TTS)
⑤ Image + Audio URL
⑤ Image + Audio URL
⑥ Image + Audio Base64
⑥ Image + Audio Base64
⑦ Built-in Character + Audio URL
⑦ Built-in Character + Audio URL
⑧ Built-in Character + Audio Base64
⑧ Built-in Character + Audio Base64
⑨ Built-in Character + Text (TTS)
⑨ Built-in Character + Text (TTS)
Request Parameters
API authorization key obtained from the NavTalk dashboard.Example:
"sk-xxx"Public URL to a video file in MP4 or MOV format. Required for video-driven methods (methods ①, ②, ③).The video must be publicly accessible via HTTP/HTTPS.Example:
"https://example.com/video.mp4"Public URL to an image file. Required for image-driven methods (methods ④, ⑤, ⑥).The image must be publicly accessible via HTTP/HTTPS.Example:
"https://example.com/photo.jpg"Built-in character name. Required for built-in character methods (methods ⑦, ⑧, ⑨).Available characters:
navtalk.Leo and other built-in characters. See Available Avatars for the complete list.Example: "navtalk.Leo"Public URL to an audio file in MP3 or WAV format. Use this when you have a pre-recorded audio file.The audio must be publicly accessible via HTTP/HTTPS.Example:
"https://example.com/audio.mp3"Base64-encoded audio data. Use this when you want to send audio data directly without hosting it online.Example:
"base64-audio-data"Text content for text-to-speech (TTS) synthesis. The API will convert this text to speech using the specified voice style.Example:
"Welcome to NavTalk. This is my first digital human video!"Voice style for text-to-speech synthesis. Required when using the
content parameter.Supported voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verseSee Voice Styles for complete descriptions and audio previews.Example: "echo"- Video-driven:
video_url+ (audio_urlORaudio_base64ORcontent) - Image-driven:
image_url+ (audio_urlORaudio_base64ORcontent) - Built-in character:
character_name+ (audio_urlORaudio_base64ORcontent)
Response Handling
All requests are processed asynchronously. The API returns atask_id that you use to query the status.
Submit Request:
Initial status when the task is created. Always
"started" in the initial response.Unique identifier for the generation task. Use this to query the task status.
task_id to check processing status:
Current status of the task. Possible values:
started: Task created and processingprocessing: Video composition in progressdone: Completed successfully, video URL availablefailed: Generation failed, check error message
Public URL to the generated video file. Only present when
status is "done".The video URL is publicly accessible and can be embedded directly in web pages, mobile apps, or downloaded for offline use.Generation typically takes 10-30 seconds. Keep videos under 30 seconds for faster processing.
Advanced Parameters
NavTalk supports optional advanced parameters for fine-tuning face cropping, mouth openness, and blending. These parameters are inherited from MuseTalk and should be used only when needed.Vertical movement of the face crop box. Positive values shift the crop downward (making the mouth more open), while negative values shift upward (making the mouth less open).Range: [-9, 9]Example:
0Pixels of extra margin added around the face crop. Increases buffer area to prevent clipping of chin, hair, or jaw.Range: [0, 50]Example:
10Defines how facial regions—especially around the jawline—are parsed and blended.Options:
"jaw" or "raw"Example: "jaw"Pixel width for blending region on the left cheek. Adjust wider to soften seam visibility.Range: [50, 150]Example:
90Pixel width for blending region on the right cheek. Functions the same as
left_cheek_width.Range: [50, 150]Example: 90These parameters are optional. Default values work well for most cases. Only adjust if you observe issues like face crop being too tight/loose or visible seams along the cheeks.