How to select the Model and voice

  • The Models & Voice section controls how your voice agent listens, thinks, and speaks during conversations.

  • First choose the Pipeline Mode:

    • STT + LLM + TTS pipeline gives more control by configuring speech-to-text, AI brain, and text-to-speech separately.

    • Realtime model uses a single unified setup for faster configuration.

  • TTS (Text-to-Speech) decides how your agent sounds — select the voice model and choose a voice that matches your brand and audience.

  • LLM (Large Language Model) is the brain of the agent — it controls response quality, speed, and how well instructions are followed. Choose based on speed, cost, and complexity.

  • STT (Speech-to-Text) converts user voice into text — select a high-accuracy model and the correct language for better understanding.

  • Enable Noise Cancellation to remove background noise and improve call clarity, especially for noisy environments.

  • Properly configuring pipeline mode, models, voice, and noise settings ensures your voice agent sounds natural, intelligent, and professional.

Last updated