Skip to main content
Video Synthesis API NavTalk Video Synthesis API supports 9 different methods for generating digital human videos, categorized into three main types: image-driven, video-driven, and built-in character-driven.

Endpoint

All requests are sent to the same endpoint:
POST https://app.navtalk.ai/generate

API Call Overview

The following table provides a quick comparison of all 9 supported methods:
MethodVisual SourceAudio SourceUse Case
① Video + Audio URLVideoAudio URLRe-dub existing videos
② Video + Audio Base64VideoAudio Base64Re-dub with local audio
③ Video + Text (TTS)VideoTextRe-dub with TTS
④ Image + Text (TTS)ImageTextCreate talking head from photo
⑤ Image + Audio URLImageAudio URLSync image with audio
⑥ Image + Audio Base64ImageAudio Base64Sync image with local audio
⑦ Built-in Character + Audio URLBuilt-inAudio URLUse preset character with audio
⑧ Built-in Character + Audio Base64Built-inAudio Base64Use preset character with local audio
⑨ Built-in Character + Text (TTS)Built-inTextUse preset character with TTS

Detailed Examples

curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "video_url": "https://example.com/video.mp4",
    "audio_url": "https://example.com/audio.mp3"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "video_url": "https://example.com/video.mp4",
    "audio_base64": "base64-audio-data"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "video_url": "https://example.com/video.mp4",
    "content": "Welcome to NavTalk.",
    "voice": "nova"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "image_url": "https://example.com/photo.jpg",
    "content": "Welcome to NavTalk.",
    "voice": "echo"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "image_url": "https://example.com/photo.jpg",
    "audio_url": "https://example.com/audio.mp3"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "image_url": "https://example.com/photo.jpg",
    "audio_base64": "base64-audio-data"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "character_name": "navtalk.Leo",
    "audio_url": "https://example.com/audio.mp3"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "character_name": "navtalk.Leo",
    "audio_base64": "base64-audio-data"
  }'
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "character_name": "navtalk.Leo",
    "content": "Welcome to NavTalk.",
    "voice": "fable"
  }'

Request Parameters

license
string
required
API authorization key obtained from the NavTalk dashboard.Example: "sk-xxx"
video_url
string
Public URL to a video file in MP4 or MOV format. Required for video-driven methods (methods ①, ②, ③).The video must be publicly accessible via HTTP/HTTPS.Example: "https://example.com/video.mp4"
image_url
string
Public URL to an image file. Required for image-driven methods (methods ④, ⑤, ⑥).The image must be publicly accessible via HTTP/HTTPS.Example: "https://example.com/photo.jpg"
character_name
string
Built-in character name. Required for built-in character methods (methods ⑦, ⑧, ⑨).Available characters: navtalk.Leo and other built-in characters. See Available Avatars for the complete list.Example: "navtalk.Leo"
audio_url
string
Public URL to an audio file in MP3 or WAV format. Use this when you have a pre-recorded audio file.The audio must be publicly accessible via HTTP/HTTPS.Example: "https://example.com/audio.mp3"
audio_base64
string
Base64-encoded audio data. Use this when you want to send audio data directly without hosting it online.Example: "base64-audio-data"
content
string
Text content for text-to-speech (TTS) synthesis. The API will convert this text to speech using the specified voice style.Example: "Welcome to NavTalk. This is my first digital human video!"
voice
string
Voice style for text-to-speech synthesis. Required when using the content parameter.Supported voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verseSee Voice Styles for complete descriptions and audio previews.Example: "echo"
Parameter Combinations: Choose one visual source and one audio source:
  • Video-driven: video_url + (audio_url OR audio_base64 OR content)
  • Image-driven: image_url + (audio_url OR audio_base64 OR content)
  • Built-in character: character_name + (audio_url OR audio_base64 OR content)

Response Handling

All requests are processed asynchronously. The API returns a task_id that you use to query the status. Submit Request:
curl -X POST "https://app.navtalk.ai/generate" \
  -H "Content-Type: application/json" \
  -d '{
    "license": "sk-xxx",
    "character_name": "navtalk.Leo",
    "content": "Welcome to NavTalk.",
    "voice": "echo"
  }'
Response:
{
  "status": "started",
  "task_id": "14cb760f-05ac-4fd3-a82c-e841f2f005d0"
}
status
string
Initial status when the task is created. Always "started" in the initial response.
task_id
string
Unique identifier for the generation task. Use this to query the task status.
Query Status: Use the task_id to check processing status:
curl -X GET "https://api.navtalk.ai/query_status?license=YOUR_LICENSE&task_id=14cb760f-05ac-4fd3-a82c-e841f2f005d0"
Response:
{
  "status": "done",
  "video_url": "https://easyaistorageaccount.blob.core.windows.net/easyai/uploadFiles/2025/05/09/xxx.mp4"
}
status
string
Current status of the task. Possible values:
  • started: Task created and processing
  • processing: Video composition in progress
  • done: Completed successfully, video URL available
  • failed: Generation failed, check error message
video_url
string
Public URL to the generated video file. Only present when status is "done".The video URL is publicly accessible and can be embedded directly in web pages, mobile apps, or downloaded for offline use.
Generation typically takes 10-30 seconds. Keep videos under 30 seconds for faster processing.

Advanced Parameters

NavTalk supports optional advanced parameters for fine-tuning face cropping, mouth openness, and blending. These parameters are inherited from MuseTalk and should be used only when needed.
bbox_shift
number
default:"0"
Vertical movement of the face crop box. Positive values shift the crop downward (making the mouth more open), while negative values shift upward (making the mouth less open).Range: [-9, 9]Example: 0
extra_margin
number
default:"10"
Pixels of extra margin added around the face crop. Increases buffer area to prevent clipping of chin, hair, or jaw.Range: [0, 50]Example: 10
parsing_mode
string
default:"\"jaw\""
Defines how facial regions—especially around the jawline—are parsed and blended.Options: "jaw" or "raw"Example: "jaw"
left_cheek_width
number
default:"90"
Pixel width for blending region on the left cheek. Adjust wider to soften seam visibility.Range: [50, 150]Example: 90
right_cheek_width
number
default:"90"
Pixel width for blending region on the right cheek. Functions the same as left_cheek_width.Range: [50, 150]Example: 90
These parameters are optional. Default values work well for most cases. Only adjust if you observe issues like face crop being too tight/loose or visible seams along the cheeks.