Multi-speaker dialogue audio API

ElevenLabs Text-to-Dialogue v3 API

Turn scripted dialogue into natural multi-voice audio.

The ElevenLabs Text-to-Dialogue v3 API converts a sequence of dialogue turns — each with its own voice — into a single expressive audio file. Built for conversations, it captures emotion, pacing, and turn-taking across 70+ languages.

Try playground

up to 10

Voices per request

up to 5000

Characters

70+

Languages

ElevenLabs

elevenlabs-text-to-dialogue-v3async task

up to 10

Voices per request

up to 5000

Characters

70+

Languages

Model capabilities

Built for production API workflows, not one-off demos.

Use ElevenLabs Text-to-Dialogue v3 from the same platform surface as the rest of your video and image stack: API keys, credits, logs, webhooks, and docs stay consistent across providers.

Per-line voice control

Assign a distinct ElevenLabs voice to each dialogue turn for believable multi-character conversations.

Expressive delivery

Eleven v3 captures emotion, emphasis, and natural turn-taking for lifelike back-and-forth dialogue.

Multilingual

Auto-detect or pin the language across 70+ supported languages, with consistent voices throughout.

Stability control

Tune the stability parameter to balance consistency against expressive variation between generations.

API workflow

Submit tasks, track progress, and return generated assets.

Submit a dialogue task

Send a dialogue array of { text, voice } turns, optional stability and language_code, and a callback URL to the audio endpoint.

Track async progress

Use the returned task ID to poll status, or let the callback deliver completion and failure events.

Receive hosted audio

Completed tasks return a mirrored audio URL through the same response shape used by other models.

POST /v1/audio/generations

curl -X POST https://api.aivideoapi.ai/v1/audio/generations \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-text-to-dialogue-v3",
    "input": {
      "dialogue": [
        { "text": "Hey! Did you finish the report?", "voice": "21m00Tcm4TlvDq8ikWAM" },
        { "text": "Almost — just adding the final numbers.", "voice": "AZnzlk1XvdvUeBnXmlld" }
      ],
      "stability": 0.5,
      "language_code": "auto"
    }
  }'

What teams build

Podcast and audio drama dialogue

Game and NPC conversations

Audiobook character voices

Language-learning dialogues

IVR and voice agent scripts

Explainer and ad voiceovers

Frequently asked questions

Answers about the ElevenLabs Text-to-Dialogue v3 API.

What is the ElevenLabs Text-to-Dialogue v3 API?

It is a multi-speaker text-to-speech API powered by ElevenLabs Eleven v3, exposed through AI Video API. You provide an ordered list of dialogue turns, each with its own voice, and receive a single hosted audio file rendering the full conversation. Generation is asynchronous with callbacks or polling.

How is the Text-to-Dialogue v3 API priced?

Pricing is per character, based on the combined length of all dialogue text in a request. Open the playground to see the exact per-character credit rate and a live cost estimate. Credits are pre-charged on submit and automatically refunded if generation fails.

How many voices and characters can one request use?

A single request supports up to 10 distinct voices and a combined total of 5000 characters across all dialogue turns. Each turn specifies its own ElevenLabs voice ID or preset name.

Which languages are supported?

Eleven v3 supports 70+ languages. Leave language_code as auto to auto-detect, or pin a specific code such as en, zh, ja, or fr.

Pricing and usage

Clear model options with shared credits.

Per character

see playground

Billed by total dialogue text length

Voices per request

up to 10

Distinct voice IDs

Character limit

5000

Total across all turns

Start building with ElevenLabs Text-to-Dialogue v3 in AI Video API.

Create one API key, use one credit balance, and switch between video and image models without provider-specific plumbing.

Read docs