Overview

The NavTalk Video Synthesis API lets you create digital human videos through a public REST interface. You can submit a task with an avatar, image, or source video, then poll the task status and retrieve the finished video URL when processing completes.

What This API Supports

Three REST Endpoints

Submit tasks with POST /api/open/v1/video-compose/submit, then use GET /status and GET /list for polling and task history.

Flexible Audio Input

Provide an uploaded audio file, a public audio URL, or plain text for TTS. Exactly one audio input method is required per task.

Flexible Character Input

Provide an uploaded image or video, a public media URL, or a trained avatarId. Exactly one visual input method is required per task.

Avatar-Based TTS

When using textContent, the API reads the selected avatar’s bound provider and voice automatically. OpenAI, ElevenLabs, and Cartesia are supported.

Asynchronous Workflow

Video generation runs as a background task. The submit endpoint returns a taskId, and you poll for Processing, Published, or Fail.

Public URLs in Responses

The /status and /list responses return full public URLs for source audio, source video, thumbnails, and the final generated result.

Authentication

This module uses the same authentication model as the other open APIs.

Recommended: pass your API key through the license request header
Compatible fallback: pass your API key as the license query parameter

Input Rules

Each task must choose exactly one audio source and exactly one visual source.

Audio source

audioFile
audioUrl
textContent

Visual source

characterFile
characterUrl
avatarId

If you use textContent, avatarId is required. This is because the API needs the avatar’s configured voice and provider to synthesize speech before video generation starts.

Recommended Usage

Use avatarId + textContent when you want the cleanest TTS workflow and voice consistency.
Use audioFile + avatarId when you already have recorded speech and want to lip-sync it to a trained avatar.
Use characterFile or characterUrl for quick one-off synthesis without training an avatar first.

Next Steps

Start with the Quick Start
Review all request and response fields in the API Reference
Learn how avatar-bound TTS works in Avatar Voice and TTS Providers

Getting Started

Real-time Digital Human API

Video Synthesis API

Custom Avatar Training

Resources

What This API Supports

Three REST Endpoints

Flexible Audio Input

Flexible Character Input

Avatar-Based TTS

Asynchronous Workflow

Public URLs in Responses

Authentication

Input Rules

Audio source

Visual source

Recommended Usage

Next Steps

​What This API Supports

Three REST Endpoints

Flexible Audio Input

Flexible Character Input

Avatar-Based TTS

Asynchronous Workflow

Public URLs in Responses

​Authentication

​Input Rules

​Audio source

​Visual source

​Recommended Usage

​Next Steps

What This API Supports

Authentication

Input Rules

Audio source

Visual source

Recommended Usage

Next Steps