Skip to main content
To train a high-quality custom avatar, you need to provide training videos that meet specific quality and content requirements.

Video Requirements

Quality Requirements:
  • Resolution: 1920x1080 (1080p) or lower
  • Frame Rate: 24fps minimum, 30fps recommended
  • Format: MP4, MOV, or other standard video formats
  • Duration: At least 2 seconds
  • File Size: Recommended under 50MB for faster upload
Content Requirements:
  • Person’s face clearly visible throughout, facing the camera (front-facing or slight angles)
  • Close-up to medium shot (face should occupy a significant portion of the frame)
  • Stable camera work with consistent, good lighting
  • Face remains in focus throughout
  • Single person only (avoid multiple people)

Reference Video Example

The following video demonstrates the recommended format and quality for training videos:

Recommendations

For optimal training results:
  • Include natural expressions, subtle head movements, and eye contact with the camera
  • Person speaking or making mouth movements (ideal but not required)
  • Video that can naturally loop from end to beginning for smoother synthesis results
Currently, the model is still being optimized for handling complex facial details such as beards. Portraits with beards or other complex facial hair may appear blurry or distorted in the synthesis results. We recommend using portraits with little or no facial hair for best results.

AI-Generated Video Support

You can use AI-generated video characters as training materials. These videos should meet the same quality requirements as regular video footage (1920x1080 or lower, clear facial features, consistent appearance, good lighting, stable video). AI-generated videos are useful for creating avatars for fictional characters, testing workflows, creating consistent digital personas, and prototyping.