Text-to-Speech
Generations (results)
Text-to-Speech
Convert text to natural-sounding audio with ElevenLabs, OpenAI TTS, and MiniMax
POST
Text-to-Speech
Request
Authorization: Bearer nb_YOUR_API_KEYTTS model slug:
minimax-tts— Free, Chinese/English, very naturalopenai-tts— OpenAI standard voicesopenai-tts-hd— OpenAI HD quality voicesgpt-4o-mini-tts— GPT-4o Mini TTSelevenlabs-flash— ElevenLabs fast (low latency)elevenlabs-v2— ElevenLabs Multilingual v2 (highest quality)
The text to synthesize. Maximum length depends on model (typically 5,000 characters).
Voice identifier. Available voices depend on the model. See examples below.
Speaking speed multiplier. Range:
0.5 – 2.0.Output audio format:
mp3, wav, oggAvailable Voices
OpenAI TTS
alloy, echo, fable, onyx, nova, shimmer
ElevenLabs
ElevenLabs supports hundreds of voices. Use common ones likerachel, adam, bella, josh or pass any ElevenLabs voice ID directly.
MiniMax TTS
male-qn-qingse, male-qn-jingying, female-shaonv, female-yujie (and more)
Request Example
Response
Speech-to-Text
Transcribe audio files to text using Whisper, GPT-4o Transcribe, or ElevenLabs Scribe.Request
STT model:
whisper— Fast, multilingual, freegpt-4o-transcribe— Highest accuracyelevenlabs-scribe— Best for podcasts and meetings (diarization support)
URL to the audio file (MP3, WAV, OGG, M4A, FLAC). Max 25MB.
ISO 639-1 language code (e.g.
en, ru, de). If omitted, the model auto-detects.Speaker diarization (who said what). Only available with
elevenlabs-scribe.
