Text-to-Speech

curl --request POST \
  --url https://api.example.com/api/v2/generate \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "text": "<string>",
  "voice_id": "<string>",
  "speed": 123,
  "format": "<string>",
  "audio_url": "<string>",
  "language": "<string>",
  "diarize": true
}
'

POST

api

generate

Text-to-Speech

curl --request POST \
  --url https://api.example.com/api/v2/generate \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "model": "<string>",
  "text": "<string>",
  "voice_id": "<string>",
  "speed": 123,
  "format": "<string>",
  "audio_url": "<string>",
  "language": "<string>",
  "diarize": true
}
'

Request

Authorization

string

required

Authorization: Bearer nb_YOUR_API_KEY

model

string

required

TTS model slug:

minimax-tts — Free, Chinese/English, very natural
openai-tts — OpenAI standard voices
openai-tts-hd — OpenAI HD quality voices
gpt-4o-mini-tts — GPT-4o Mini TTS
elevenlabs-flash — ElevenLabs fast (low latency)
elevenlabs-v2 — ElevenLabs Multilingual v2 (highest quality)

text

string

required

The text to synthesize. Maximum length depends on model (typically 5,000 characters).

voice_id

string

Voice identifier. Available voices depend on the model. See examples below.

speed

number

default:"1.0"

Speaking speed multiplier. Range: 0.5 – 2.0.

format

string

default:"mp3"

Output audio format: mp3, wav, ogg

Available Voices

OpenAI TTS

alloy, echo, fable, onyx, nova, shimmer

ElevenLabs

ElevenLabs supports hundreds of voices. Use common ones like rachel, adam, bella, josh or pass any ElevenLabs voice ID directly.

MiniMax TTS

male-qn-qingse, male-qn-jingying, female-shaonv, female-yujie (and more)

Request Example

curl -X POST https://elumenta.ru/api/v2/generate \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "elevenlabs-v2",
    "text": "Welcome to Elumenta. Your AI-powered creative studio.",
    "voice_id": "rachel",
    "speed": 1.0,
    "format": "mp3"
  }'

Response

{
  "id": "gen_01j9x2tts001",
  "status": "completed",
  "model": "elevenlabs-v2",
  "url": "https://storage.elumenta.ru/generations/gen_01j9x2tts001.mp3",
  "duration_seconds": 3.4,
  "characters": 54,
  "tokens_used": 5,
  "balance_remaining": 284,
  "created_at": "2026-03-08T12:10:00Z"
}

Speech-to-Text

Transcribe audio files to text using Whisper, GPT-4o Transcribe, or ElevenLabs Scribe.

Request

model

string

required

STT model:

whisper — Fast, multilingual, free
gpt-4o-transcribe — Highest accuracy
elevenlabs-scribe — Best for podcasts and meetings (diarization support)

audio_url

string

required

URL to the audio file (MP3, WAV, OGG, M4A, FLAC). Max 25MB.

language

string

ISO 639-1 language code (e.g. en, ru, de). If omitted, the model auto-detects.

diarize

boolean

default:"false"

Speaker diarization (who said what). Only available with elevenlabs-scribe.

Request Example

curl -X POST https://elumenta.ru/api/v2/generate \
  -H "Authorization: Bearer nb_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "whisper",
    "audio_url": "https://example.com/interview.mp3",
    "language": "en"
  }'

Response

{
  "id": "gen_01j9x2stt001",
  "status": "completed",
  "model": "whisper",
  "text": "Hello and welcome to today's episode...",
  "language": "en",
  "duration_seconds": 142.5,
  "tokens_used": 0,
  "created_at": "2026-03-08T12:15:00Z"
}

Music Generation Speech-to-Text

​Request

​Available Voices

​OpenAI TTS

​ElevenLabs

​MiniMax TTS

​Request Example

​Response

​Speech-to-Text

​Request

​Request Example

​Response

Request

Available Voices

OpenAI TTS

ElevenLabs

MiniMax TTS

Request Example

Response

Speech-to-Text

Request

Request Example

Response